ArXiv Preprint
Modern machine learning pipelines, in particular those based on deep learning
(DL) models, require large amounts of labeled data. For classification
problems, the most common learning paradigm consists of presenting labeled
examples during training, thus providing strong supervision on what constitutes
positive and negative samples. This constitutes a major obstacle for the
development of DL models in radiology--in particular for cross-sectional
imaging (e.g., computed tomography [CT] scans)--where labels must come from
manual annotations by expert radiologists at the image or slice-level. These
differ from examination-level annotations, which are coarser but cheaper, and
could be extracted from radiology reports using natural language processing
techniques. This work studies the question of what kind of labels should be
collected for the problem of intracranial hemorrhage detection in brain CT. We
investigate whether image-level annotations should be preferred to
examination-level ones. By framing this task as a multiple instance learning
problem, and employing modern attention-based DL architectures, we analyze the
degree to which different levels of supervision improve detection performance.
We find that strong supervision (i.e., learning with local image-level
annotations) and weak supervision (i.e., learning with only global
examination-level labels) achieve comparable performance in examination-level
hemorrhage detection (the task of selecting the images in an examination that
show signs of hemorrhage) as well as in image-level hemorrhage detection
(highlighting those signs within the selected images). Furthermore, we study
this behavior as a function of the number of labels available during training.
Our results suggest that local labels may not be necessary at all for these
tasks, drastically reducing the time and cost involved in collecting and
curating datasets.
Jacopo Teneggi, Paul H. Yi, Jeremias Sulam
2022-11-29