ArXiv Preprint
Self-supervised representation learning has been extremely successful in
medical image analysis, as it requires no human annotations to provide
transferable representations for downstream tasks. Recent self-supervised
learning methods are dominated by noise-contrastive estimation (NCE, also known
as contrastive learning), which aims to learn invariant visual representations
by contrasting one homogeneous image pair with a large number of heterogeneous
image pairs in each training step. Nonetheless, NCE-based approaches still
suffer from one major problem that is one homogeneous pair is not enough to
extract robust and invariant semantic information. Inspired by the archetypical
triplet loss, we propose GraVIS, which is specifically optimized for learning
self-supervised features from dermatology images, to group homogeneous
dermatology images while separating heterogeneous ones. In addition, a
hardness-aware attention is introduced and incorporated to address the
importance of homogeneous image views with similar appearance instead of those
dissimilar homogeneous ones. GraVIS significantly outperforms its transfer
learning and self-supervised learning counterparts in both lesion segmentation
and disease classification tasks, sometimes by 5 percents under extremely
limited supervision. More importantly, when equipped with the pre-trained
weights provided by GraVIS, a single model could achieve better results than
winners that heavily rely on ensemble strategies in the well-known ISIC 2017
challenge.
Hong-Yu Zhou, Chixiang Lu, Liansheng Wang, Yizhou Yu
2023-01-11