In JCO clinical cancer informatics
PURPOSE : Artificial intelligence (AI) models for medical image diagnosis are often trained and validated on curated data. However, in a clinical setting, images that are outliers with respect to the training data, such as those representing rare disease conditions or acquired using a slightly different setup, can lead to wrong decisions. It is not practical to expect clinicians to be trained to discount results for such outlier images. Toward clinical deployment, we have designed a method to train cautious AI that can automatically flag outlier cases.
MATERIALS AND METHODS : Our method-ClassClust-forms tight clusters of training images using supervised contrastive learning, which helps it identify outliers during testing. We compared ClassClust's ability to detect outliers with three competing methods on four publicly available data sets covering pathology, dermatoscopy, and radiology. We held out certain diseases, artifacts, and types of images from training data and examined the ability of various models to detect these as outliers during testing. We compared the decision accuracy of the models on held-out nonoutlier images also. We visualized the regions of the images that the models used for their decisions.
RESULTS : Area under receiver operating characteristic curve for outlier detection was consistently higher using ClassClust compared with the previous methods. Average accuracy on held-out nonoutlier images was also higher, and the visualizations of image regions were more informative using ClassClust.
CONCLUSION : The ability to flag outlier test cases need not be at odds with the ability to accurately classify nonoutliers in AI models. Although the latter capability has received research and regulatory attention, AI models for clinical deployment should possess the former as well.
Kanse Abhiraj S, Kurian Nikhil C, Aswani Himanshu P, Khan Zakia, Gann Peter H, Rane Swapnil, Sethi Amit