In European journal of radiology ; h5-index 47.0
PURPOSE : To evaluate a deep learning based image analysis software for the detection and localization of distal radius fractures.
METHOD : A deep learning system (DLS) was trained on 524 wrist radiographs (166 showing fractures). Performance was tested on internal (100 radiographs, 42 showing fractures) and external test sets (200 radiographs, 100 showing fractures). Single and combined views of the radiographs were shown to DLS and three readers. Readers were asked to indicate fracture location with regions of interest (ROI). The DLS yielded scores (range 0-1) and a heatmap. Detection performance was expressed as AUC, sensitivity and specificity at the optimal threshold and compared to radiologists' performance. Heatmaps were compared to radiologists' ROIs.
RESULTS : The DLS showed excellent performance on the internal test set (AUC 0.93 (95% confidence interval (CI) 0.82-0.98) - 0.96 (0.87-1.00), sensitivity 0.81 (0.58-0.95) - 0.90 (0.70-0.99), specificity 0.86 (0.68-0.96) - 1.0 (0.88-1.0)). DLS performance decreased on the external test set (AUC 0.80 (0.71-0.88) - 0.89 (0.81-0.94), sensitivity 0.64 (0.49-0.77) - 0.92 (0.81-0.98), specificity 0.60 (0.45-0.74) - 0.90 (0.78-0.97)). Radiologists' performance was comparable on internal data (sensitivity 0.71 (0.48-0.89) - 0.95 (0.76-1.0), specificity 0.52 (0.32-0.71) - 0.97 (0.82-1.0)) and better on external data (sensitivity 0.88 (0.76-0.96) - 0.98 (0.89-1.0), specificities 0.66 (0.51-0.79) - 1.0 (0.93-1.0), p < 0.05). In over 90%, the areas of peak activation aligned with radiologists' annotations.
CONCLUSIONS : The DLS was able to detect and localize wrist fractures with a performance comparable to radiologists, using only a small dataset for training.
Blüthgen Christian, Becker Anton S, Vittoria de Martini Ilaria, Meier Andreas, Martini Katharina, Frauenfelder Thomas