In Journal of digital imaging
In this work, we assess how pre-training strategy affects deep learning performance for the task of distinguishing false-recall from malignancy and normal (benign) findings in digital mammography images. A cohort of 1303 breast cancer screening patients (4935 digital mammogram images in total) was retrospectively analyzed as the target dataset for this study. We assessed six different convolutional neural network model structures utilizing four different imaging datasets (total > 1.4 million images (including ImageNet); medical images different in terms of scale, modality, organ, and source) for pre-training on six classification tasks to assess how the performance of CNN models varies based on training strategy. Representative pre-training strategies included transfer learning with medical and non-medical datasets, layer freezing, varied network structure, and multi-view input for both binary and triple-class classification of mammogram images. The area under the receiver operating characteristic curve (AUC) was used as the model performance metric. The best performing model out of all experimental settings was an AlexNet model incrementally pre-trained on ImageNet and a large Breast Density dataset. The AUC for the six classification tasks using this model ranged from 0.68 to 0.77. In the case of distinguishing recalled-benign mammograms from others, four out of five pre-training strategies tested produced significant performance differences from the baseline model. This study suggests that pre-training strategy influences significant performance differences, especially in the case of distinguishing recalled- benign from malignant and benign screening patients.
Clancy Kadie, Aboutalib Sarah, Mohamed Aly, Sumkin Jules, Wu Shandong
Breast cancer, Deep learning, Digital mammography, Training strategy, Transfer learning