In AJR. American journal of roentgenology
Background: In current clinical practice, thyroid nodules in children are generally evaluated based on radiologists' overall impressions of ultrasound images. Objective: To compare the diagnostic performance of radiologists' overall impression, American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS), and a deep learning algorithm in differentiating benign and malignant thyroid nodules on ultrasound in children and young adults. Methods: This retrospective study included 139 patients (median age 17.5 years; 119 female, 20 male) evaluated from January 1, 2004 to September 18, 2020 with age ≤21 years and a thyroid nodule on ultrasound with definitive pathologic results from fine-needle aspiration and/or surgical excision to serve as reference standard. A single nodule per patient was selected, and single transverse and longitudinal images of the nodule were extracted for further evaluation. Three radiologists independently characterized nodules based on their overall impression (benign vs malignant) and ACR TI-RADS. A previously developed deep learning algorithm determined for each nodule a likelihood of malignancy, which was used to derive a risk level. Sensitivities and specificities for malignancy were calculated. Agreement was assessed using Cohen's kappa coefficients. Results: For radiologists' overall impression, sensitivity ranged from 32.1% to 75.0% [mean, 58.3% (95% CI: 49.2-67.3%)], and specificity ranged from 63.8% to 93.9% [mean, 79.9% (95% CI: 73.8-85.7%)]. For ACR TI-RADS, sensitivity ranged from 82.1% to 87.5% [mean, 85.1% (95% CI: 77.3-92.1%), and specificity ranged from 47.0% to 54.2% [mean, 50.6% (95% CI: 41.4-59.8%)]. The deep learning algorithm had sensitivity of 87.5% (95% CI: 78.3-95.5%) and specificity of 36.1% (95% CI: 25.6-46.8%). Interobserver agreement among pairwise combinations of readers, expressed as kappa, for overall impression was 0.227 to 0.472 and for ACR TI-RADS was 0.597 to 0.643. Conclusion: Both ACR TI-RADS and the deep learning algorithm had higher sensitivity albeit lower specificity compared with overall impressions. The deep learning algorithm had similar sensitivity but lower specificity than ACR TI-RADS. Interobserver agreement was higher for ACR TI-RADS than for overall impressions. Clinical Impact: ACR TI-RADS and the deep learning algorithm may serve as potential alternative strategies for guiding decisions to perform fine-needle aspiration of thyroid nodules in children.
Yang Jichen, Page Laura C, Wagner Lars, Wildman-Tobriner Benjamin, Bisset Logan, Frush Donald, Mazurowski Maciej