ArXiv Preprint
Chest X-rays (CXRs) are a widely used imaging modality for the diagnosis and
prognosis of lung disease. The image analysis tasks vary. Examples include
pathology detection and lung segmentation. There is a large body of work where
machine learning algorithms are developed for specific tasks. A significant
recent example is Coronavirus disease (covid-19) detection using CXR data.
However, the traditional diagnostic tool design methods based on supervised
learning are burdened by the need to provide training data annotation, which
should be of good quality for better clinical outcomes. Here, we propose an
alternative solution, a new self-supervised paradigm, where a general
representation from CXRs is learned using a group-masked self-supervised
framework. The pre-trained model is then fine-tuned for domain-specific tasks
such as covid-19, pneumonia detection, and general health screening. We show
that the same pre-training can be used for the lung segmentation task. Our
proposed paradigm shows robust performance in multiple downstream tasks which
demonstrates the success of the pre-training. Moreover, the performance of the
pre-trained models on data with significant drift during test time proves the
learning of a better generic representation. The methods are further validated
by covid-19 detection in a unique small-scale pediatric data set. The
performance gain in accuracy (~25\%) is significant when compared to a
supervised transformer-based method. This adds credence to the strength and
reliability of our proposed framework and pre-training strategy.
Syed Muhammad Anwar, Abhijeet Parida, Sara Atito, Muhammad Awais, Gustavo Nino, Josef Kitler, Marius George Linguraru
2022-11-23