ArXiv Preprint
In recent years, large strides have been taken in developing machine learning
methods for dermatological applications, supported in part by the success of
deep learning (DL). To date, diagnosing diseases from images is one of the most
explored applications of DL within dermatology. Convolutional neural networks
(ConvNets) are the most common (DL) method in medical imaging due to their
training efficiency and accuracy, although they are often described as black
boxes because of their limited explainability. One popular way to obtain
insight into a ConvNet's decision mechanism is gradient class activation maps
(Grad-CAM). A quantitative evaluation of the Grad-CAM explainability has been
recently made possible by the release of DermXDB, a skin disease diagnosis
explainability dataset which enables explainability benchmarking of ConvNet
architectures. In this paper, we perform a literature review to identify the
most common ConvNet architectures used for this task, and compare their
Grad-CAM explanations with the explanation maps provided by DermXDB. We
identified 11 architectures: DenseNet121, EfficientNet-B0, InceptionV3,
InceptionResNetV2, MobileNet, MobileNetV2, NASNetMobile, ResNet50, ResNet50V2,
VGG16, and Xception. We pre-trained all architectures on an clinical skin
disease dataset, and fine-tuned them on a DermXDB subset. Validation results on
the DermXDB holdout subset show an explainability F1 score of between
0.35-0.46, with Xception displaying the highest explainability performance.
NASNetMobile reports the highest characteristic-level explainability
sensitivity, despite it's mediocre diagnosis performance. These results
highlight the importance of choosing the right architecture for the desired
application and target market, underline need for additional explainability
datasets, and further confirm the need for explainability benchmarking that
relies on quantitative analyses.
Raluca Jalaboi, Ole Winther, Alfiia Galimzianova
2023-02-23