ArXiv Preprint
Recent advances in self-supervised learning (SSL) in computer vision are
primarily comparative, whose goal is to preserve invariant and discriminative
semantics in latent representations by comparing siamese image views. However,
the preserved high-level semantics do not contain enough local information,
which is vital in medical image analysis (e.g., image-based diagnosis and tumor
segmentation). To mitigate the locality problem of comparative SSL, we propose
to incorporate the task of pixel restoration for explicitly encoding more
pixel-level information into high-level semantics. We also address the
preservation of scale information, a powerful tool in aiding image
understanding but has not drawn much attention in SSL. The resulting framework
can be formulated as a multi-task optimization problem on the feature pyramid.
Specifically, we conduct multi-scale pixel restoration and siamese feature
comparison in the pyramid. In addition, we propose non-skip U-Net to build the
feature pyramid and develop sub-crop to replace multi-crop in 3D medical
imaging. The proposed unified SSL framework (PCRLv2) surpasses its
self-supervised counterparts on various tasks, including brain tumor
segmentation (BraTS 2018), chest pathology identification (ChestX-ray,
CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation
(LiTS), sometimes outperforming them by large margins with limited annotations.
Hong-Yu Zhou, Chixiang Lu, Chaoqi Chen, Sibei Yang, Yizhou Yu
2023-01-02