ArXiv Preprint
Generalization is an important attribute of machine learning models,
particularly for those that are to be deployed in a medical context, where
unreliable predictions can have real world consequences. While the failure of
models to generalize across datasets is typically attributed to a mismatch in
the data distributions, performance gaps are often a consequence of biases in
the ``ground-truth" label annotations. This is particularly important in the
context of medical image segmentation of pathological structures (e.g.
lesions), where the annotation process is much more subjective, and affected by
a number underlying factors, including the annotation protocol, rater
education/experience, and clinical aims, among others. In this paper, we show
that modeling annotation biases, rather than ignoring them, poses a promising
way of accounting for differences in annotation style across datasets. To this
end, we propose a generalized conditioning framework to (1) learn and account
for different annotation styles across multiple datasets using a single model,
(2) identify similar annotation styles across different datasets in order to
permit their effective aggregation, and (3) fine-tune a fully trained model to
a new annotation style with just a few samples. Next, we present an
image-conditioning approach to model annotation styles that correlate with
specific image features, potentially enabling detection biases to be more
easily identified.
Brennan Nichyporuk, Jillian Cardinell, Justin Szeto, Raghav Mehta, Jean-Pierre R. Falet, Douglas L. Arnold, Sotirios A. Tsaftaris, Tal Arbel
2022-10-31