ArXiv Preprint
Deep neural networks have been successfully adopted to diverse domains
including pathology classification based on medical images. However,
large-scale and high-quality data to train powerful neural networks are rare in
the medical domain as the labeling must be done by qualified experts.
Researchers recently tackled this problem with some success by taking advantage
of models pre-trained on large-scale general domain data. Specifically,
researchers took contrastive image-text encoders (e.g., CLIP) and fine-tuned it
with chest X-ray images and paired reports to perform zero-shot pathology
classification, thus completely removing the need for pathology-annotated
images to train a classification model. Existing studies, however, fine-tuned
the pre-trained model with the same contrastive learning objective, and failed
to exploit the multi-labeled nature of medical image-report pairs. In this
paper, we propose a new fine-tuning strategy based on sentence sampling and
positive-pair loss relaxation for improving the downstream zero-shot pathology
classification performance, which can be applied to any pre-trained contrastive
image-text encoders. Our method consistently showed dramatically improved
zero-shot pathology classification performance on four different chest X-ray
datasets and 3 different pre-trained models (5.77% average AUROC increase). In
particular, fine-tuning CLIP with our method showed much comparable or
marginally outperformed to board-certified radiologists (0.619 vs 0.625 in F1
score and 0.530 vs 0.544 in MCC) in zero-shot classification of five prominent
diseases from the CheXpert dataset.
Jongseong Jang, Daeun Kyung, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae, Edward Choi
2022-12-14