In The Journal of investigative dermatology
Artificial Intelligence (AI) algorithms to classify melanoma are dependent on their training data, which limits generalizability. The objective of this study was to compare the performance of an AI model trained on a standard adult-predominant dermoscopic dataset before and after the addition of additional pediatric training images. The performance will be compared using held out adult and pediatric test sets of images. We trained 2 models: one (Model A) on an adult-predominant dataset (37,662 images from the International Skin Imaging Collaboration (ISIC)), and one (Model A+P) on an additional 1536 pediatric images. We compared performance between the two models on adult and pediatric held out test images separately using area under the receiver-operating-characteristic curve (AUROC). We then used Gradient-weighted Class Activation Maps and background skin masking to understand the contributions of the lesion versus background skin to algorithm decision making. Adding images from a pediatric population with different epidemiological and visual patterns to current reference standard datasets improved algorithm performance on pediatric images without diminishing performance on adult images. This suggests a way dermatologic AI models can be made more generalizable. The presence of background skin was important to the pediatric-specific improvement seen between models.
Mehta Paras P, Sun Mary, Betz-Stablein Brigid, Halpern Allan, Soyer H Peter, Weber Jochen, Kose Kivanc, Rotemberg Veronica
2023-Feb-17