ArXiv Preprint
Understanding of spatial attributes is central to effective 3D radiology
image analysis where crop-based learning is the de facto standard. Given an
image patch, its core spatial properties (e.g., position & orientation) provide
helpful priors on expected object sizes, appearances, and structures through
inherent anatomical consistencies. Spatial correspondences, in particular, can
effectively gauge semantic similarities between inter-image regions, while
their approximate extraction requires no annotations or overbearing
computational costs. However, recent 3D contrastive learning approaches either
neglect correspondences or fail to maximally capitalize on them. To this end,
we propose an extensible 3D contrastive framework (Spade, for Spatial
Debiasing) that leverages extracted correspondences to select more effective
positive & negative samples for representation learning. Our method learns both
globally invariant and locally equivariant representations with downstream
segmentation in mind. We also propose separate selection strategies for global
& local scopes that tailor to their respective representational requirements.
Compared to recent state-of-the-art approaches, Spade shows notable
improvements on three downstream segmentation tasks (CT Abdominal Organ, CT
Heart, MR Heart).
Yejia Zhang, Nishchal Sapkota, Pengfei Gu, Yaopeng Peng, Hao Zheng, Danny Z. Chen
2022-11-16