Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Radiology Radiology

Deep learning for automated, interpretable classification of lumbar spinal stenosis and facet arthropathy from axial MRI.

In European radiology ; h5-index 62.0

OBJECTIVES : To evaluate a deep learning model for automated and interpretable classification of central canal stenosis, neural foraminal stenosis, and facet arthropathy from lumbar spine MRI.

METHODS : T2-weighted axial MRI studies of the lumbar spine acquired between 2008 and 2019 were retrospectively selected (n = 200) and graded for central canal stenosis, neural foraminal stenosis, and facet arthropathy. Studies were partitioned into patient-level train (n = 150), validation (n = 20), and test (n = 30) splits. V-Net models were first trained to segment the dural sac and the intervertebral disk, and localize facet and foramen using geometric rules. Subsequently, Big Transfer (BiT) models were trained for downstream classification tasks. An interpretable model for central canal stenosis was also trained using a decision tree classifier. Evaluation metrics included linearly weighted Cohen's kappa score for multi-grade classification and area under the receiver operator characteristic curve (AUROC) for binarized classification.

RESULTS : Segmentation of the dural sac and intervertebral disk achieved Dice scores of 0.93 and 0.94. Localization of foramen and facet achieved intersection over union of 0.72 and 0.83. Multi-class grading of central canal stenosis achieved a kappa score of 0.54. The interpretable decision tree classifier had a kappa score of 0.80. Pairwise agreement between readers (R1, R2), (R1, R3), and (R2, R3) was 0.86, 0.80, and 0.74. Binary classification of neural foraminal stenosis and facet arthropathy achieved AUROCs of 0.92 and 0.93.

CONCLUSION : Deep learning systems can be performant as well as interpretable for automated evaluation of lumbar spine MRI including classification of central canal stenosis, neural foraminal stenosis, and facet arthropathy.

KEY POINTS : • Interpretable deep-learning systems can be developed for the evaluation of clinical lumbar spine MRI. Multi-grade classification of central canal stenosis with a kappa of 0.80 was comparable to inter-reader agreement scores (0.74, 0.80, 0.86). Binary classification of neural foraminal stenosis and facet arthropathy achieved favorable and accurate AUROCs of 0.92 and 0.93, respectively. • While existing deep-learning systems are opaque, leading to clinical deployment challenges, the proposed system is accurate as well as interpretable, providing valuable information to a radiologist in clinical practice.

Bharadwaj Upasana Upadhyay, Christine Miranda, Li Steven, Chou Dean, Pedoia Valentina, Link Thomas M, Chin Cynthia T, Majumdar Sharmila

2023-Mar-15

Arthropathy, Deep learning, MRI, Stenosis

Radiology Radiology

Contrast-enhanced CT radiomics improves the prediction of abdominal aortic aneurysm progression.

In European radiology ; h5-index 62.0

OBJECTIVES : To determine if three-dimensional (3D) radiomic features of contrast-enhanced CT (CECT) images improve prediction of rapid abdominal aortic aneurysm (AAA) growth.

METHODS : This longitudinal cohort study retrospectively analyzed 195 consecutive patients (mean age, 72.4 years ± 9.1) with a baseline CECT and a subsequent CT or MR at least 6 months later. 3D radiomic features were measured for 3 regions of the AAA, viz. the vessel lumen only; the intraluminal thrombus (ILT) and aortic wall only; and the entire AAA sac (lumen, ILT, and wall). Multiple machine learning (ML) models to predict rapid growth, defined as the upper tercile of observed growth (> 0.25 cm/year), were developed using data from 60% of the patients. Diagnostic accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) in the remaining 40% of patients.

RESULTS : The median AAA maximum diameter was 3.9 cm (interquartile range [IQR], 3.3-4.4 cm) at baseline and 4.4 cm (IQR, 3.7-5.4 cm) at the mean follow-up time of 3.2 ± 2.4 years (range, 0.5-9 years). A logistic regression model using 7 radiomic features of the ILT and wall had the highest AUC (0.83; 95% confidence interval [CI], 0.73-0.88) in the development cohort. In the independent test cohort, this model had a statistically significantly higher AUC than a model including maximum diameter, AAA volume, and relevant clinical factors (AUC = 0.78, 95% CI, 0.67-0.87 vs AUC = 0.69, 95% CI, 0.57-0.79; p = 0.04).

CONCLUSION : A radiomics-based method focused on the ILT and wall improved prediction of rapid AAA growth from CECT imaging.

KEY POINTS : • Radiomic analysis of 195 abdominal CECT revealed that an ML-based model that included textural features of intraluminal thrombus (if present) and aortic wall improved prediction of rapid AAA progression compared to maximum diameter. • Predictive accuracy was higher when radiomic features were obtained from the thrombus and wall as opposed to the entire AAA sac (including lumen), or the lumen alone. • Logistic regression of selected radiomic features yielded similar accuracy to predict rapid AAA progression as random forests or support vector machines.

Wang Yan, Xiong Fei, Leach Joseph, Kao Evan, Tian Bing, Zhu Chengcheng, Zhang Yue, Hope Michael, Saloner David, Mitsouras Dimitrios

2023-Mar-15

Abdominal aortic aneurysm, Computed tomography, Machine learning, Thrombus

Pathology Pathology

Expanding inclusion criteria for active surveillance in intermediate-risk prostate cancer: a machine learning approach.

In World journal of urology ; h5-index 40.0

PURPOSE : To develop new selection criteria for active surveillance (AS) in intermediate-risk (IR) prostate cancer (PCa) patients.

METHODS : Retrospective study including patients from 14 referral centers who underwent pre-biopsy mpMRI, image-guided biopsies and radical prostatectomy. The cohort included biopsy-naive IR PCa patients who met the following inclusion criteria: Gleason Grade Group (GGG) 1-2, PSA < 20 ng/mL, and cT1-cT2 tumors. We relied on a recursive machine learning partitioning algorithm developed to predict adverse pathological features (i.e., ≥ pT3a and/or pN + and/or GGG ≥ 3).

RESULTS : A total of 594 patients with IR PCa were included, of whom 220 (37%) had adverse features. PI-RADS score (weight:0.726), PSA density (weight:0.158), and clinical T stage (weight:0.116) were selected as the most informative risk factors to classify patients according to their risk of adverse features, leading to the creation of five risk clusters. The adverse feature rates for cluster #1 (PI-RADS ≤ 3 and PSA density < 0.15), cluster #2 (PI-RADS 4 and PSA density < 0.15), cluster #3 (PI-RADS 1-4 and PSA density ≥ 0.15), cluster #4 (normal DRE and PI-RADS 5), and cluster #5 (abnormal DRE and PI-RADS 5) were 11.8, 27.9, 37.3, 42.7, and 65.1%, respectively. Compared with the current inclusion criteria, extending the AS criteria to clusters #1 + #2 or #1 + #2 + #3 would increase the number of eligible patients (+ 60 and + 253%, respectively) without increasing the risk of adverse pathological features.

CONCLUSIONS : The newly developed model has the potential to expand the number of patients eligible for AS without compromising oncologic outcomes. Prospective validation is warranted.

Baboudjian Michael, Breda Alberto, Roumeguère Thierry, Uleri Alessandro, Roche Jean-Baptiste, Touzani Alae, Lacetera Vito, Beauval Jean-Baptiste, Diamand Romain, Simone Guiseppe, Windisch Olivier, Benamran Daniel, Fourcade Alexandre, Fiard Gaelle, Durand-Labrunie Camille, Roumiguié Mathieu, Sanguedolce Francesco, Oderda Marco, Barret Eric, Fromont Gaëlle, Dariane Charles, Charvet Anne-Laure, Gondran-Tellier Bastien, Bastide Cyrille, Lechevallier Eric, Palou Joan, Ruffion Alain, Van Der Bergh Roderick C N, Peltier Alexandre, Ploussard Guillaume

2023-Mar-15

Active surveillance, Intermediate risk, Machine learning, Oncological outcomes, Prostate cancer

General General

Smartphone and Wearable Sensors for the Estimation of Facioscapulohumeral Muscular Dystrophy Disease Severity: Cross-sectional Study.

In JMIR formative research

BACKGROUND : Facioscapulohumeral muscular dystrophy (FSHD) is a progressive neuromuscular disease. Its slow and variable progression makes the development of new treatments highly dependent on validated biomarkers that can quantify disease progression and response to drug interventions.

OBJECTIVE : We aimed to build a tool that estimates FSHD clinical severity based on behavioral features captured using smartphone and remote sensor data. The adoption of remote monitoring tools, such as smartphones and wearables, would provide a novel opportunity for continuous, passive, and objective monitoring of FSHD symptom severity outside the clinic.

METHODS : In total, 38 genetically confirmed patients with FSHD were enrolled. The FSHD Clinical Score and the Timed Up and Go (TUG) test were used to assess FSHD symptom severity at days 0 and 42. Remote sensor data were collected using an Android smartphone, Withings Steel HR+, Body+, and BPM Connect+ for 6 continuous weeks. We created 2 single-task regression models that estimated the FSHD Clinical Score and TUG separately. Further, we built 1 multitask regression model that estimated the 2 clinical assessments simultaneously. Further, we assessed how an increasingly incremental time window affected the model performance. To do so, we trained the models on an incrementally increasing time window (from day 1 until day 14) and evaluated the predictions of the clinical severity on the remaining 4 weeks of data.

RESULTS : The single-task regression models achieved an R2 of 0.57 and 0.59 and a root-mean-square error (RMSE) of 2.09 and 1.66 when estimating FSHD Clinical Score and TUG, respectively. Time spent at a health-related location (such as a gym or hospital) and call duration were features that were predictive of both clinical assessments. The multitask model achieved an R2 of 0.66 and 0.81 and an RMSE of 1.97 and 1.61 for the FSHD Clinical Score and TUG, respectively, and therefore outperformed the single-task models in estimating clinical severity. The 3 most important features selected by the multitask model were light sleep duration, total steps per day, and mean steps per minute. Using an increasing time window (starting from day 1 to day 14) for the FSHD Clinical Score, TUG, and multitask estimation yielded an average R2 of 0.65, 0.79, and 0.76 and an average RMSE of 3.37, 2.05, and 4.37, respectively.

CONCLUSIONS : We demonstrated that smartphone and remote sensor data could be used to estimate FSHD clinical severity and therefore complement the assessment of FSHD outside the clinic. In addition, our results illustrated that training the models on the first week of data allows for consistent and stable prediction of FSHD symptom severity. Longitudinal follow-up studies should be conducted to further validate the reliability and validity of the multitask model as a tool to monitor disease progression over a longer period.

TRIAL REGISTRATION : ClinicalTrials.gov NCT04999735; https://www.clinicaltrials.gov/ct2/show/NCT04999735.

Zhuparris Ahnjili, Maleki Ghobad, Koopmans Ingrid, Doll Robert J, Voet Nicoline, Kraaij Wessel, Cohen Adam, van Brummelen Emilie, De Maeyer Joris H, Groeneveld Geert Jan

2023-Mar-15

FSHD, Time Up and Go, facioscapulohumeral muscular dystrophy, mHealth, machine learning, mobile health, mobile phone, neuromuscular disease, regression, smartphone, wearables

General General

Objective Prediction of Next-Day's Affect Using Multimodal Physiological and Behavioral Data: Algorithm Development and Validation Study.

In JMIR formative research

BACKGROUND : Affective states are important aspects of healthy functioning; as such, monitoring and understanding affect is necessary for the assessment and treatment of mood-based disorders. Recent advancements in wearable technologies have increased the use of such tools in detecting and accurately estimating mental states (eg, affect, mood, and stress), offering comprehensive and continuous monitoring of individuals over time.

OBJECTIVE : Previous attempts to model an individual's mental state relied on subjective measurements or the inclusion of only a few objective monitoring modalities (eg, smartphones). This study aims to investigate the capacity of monitoring affect using fully objective measurements. We conducted a comparatively long-term (12-month) study with a holistic sampling of participants' moods, including 20 affective states.

METHODS : Longitudinal physiological data (eg, sleep and heart rate), as well as daily assessments of affect, were collected using 3 modalities (ie, smartphone, watch, and ring) from 20 college students over a year. We examined the difference between the distributions of data collected from each modality along with the differences between their rates of missingness. Out of the 20 participants, 7 provided us with 200 or more days' worth of data, and we used this for our predictive modeling setup. Distributions of positive affect (PA) and negative affect (NA) among the 7 selected participants were observed. For predictive modeling, we assessed the performance of different machine learning models, including random forests (RFs), support vector machines (SVMs), multilayer perceptron (MLP), and K-nearest neighbor (KNN). We also investigated the capability of each modality in predicting mood and the most important features of PA and NA RF models.

RESULTS : RF was the best-performing model in our analysis and performed mood and stress (nervousness) prediction with ~81% and ~72% accuracy, respectively. PA models resulted in better performance compared to NA. The order of the most important modalities in predicting PA and NA was the smart ring, phone, and watch, respectively. SHAP (Shapley Additive Explanations) analysis showed that sleep and activity-related features were the most impactful in predicting PA and NA.

CONCLUSIONS : Generic machine learning-based affect prediction models, trained with population data, outperform existing methods, which use the individual's historical information. Our findings indicated that our mood prediction method outperformed the existing methods. Additionally, we found that sleep and activity level were the most important features for predicting next-day PA and NA, respectively.

Jafarlou Salar, Lai Jocelyn, Azimi Iman, Mousavi Zahra, Labbaf Sina, Jain Ramesh C, Dutt Nikil, Borelli Jessica L, Rahmani Amir

2023-Mar-15

affective computing, mental health, wearable devices

Dermatology Dermatology

Development and Clinical Evaluation of an Artificial Intelligence Support Tool for Improving Telemedicine Photo Quality.

In JAMA dermatology ; h5-index 54.0

IMPORTANCE : Telemedicine use accelerated during the COVID-19 pandemic, and skin conditions were a common use case. However, many images submitted may be of insufficient quality for making a clinical determination.

OBJECTIVE : To determine whether an artificial intelligence (AI) decision support tool, a machine learning algorithm, could improve the quality of images submitted for telemedicine by providing real-time feedback and explanations to patients.

DESIGN, SETTING, AND PARTICIPANTS : This quality improvement study with an AI performance component and single-arm clinical pilot study component was conducted from March 2020 to October 2021. After training, the AI decision support tool was tested on 357 retrospectively collected telemedicine images from Stanford telemedicine from March 2020 to June 2021. Subsequently, a single-arm clinical pilot study was conducted to assess feasibility with 98 patients in the Stanford Department of Dermatology across 2 clinical sites from July 2021 to October 2021. For the clinical pilot study, inclusion criteria for patients included being adults (aged ≥18 years), presenting to clinic for a skin condition, and being able to photograph their own skin with a smartphone.

INTERVENTIONS : During the clinical pilot study, patients were given a handheld smartphone device with a machine learning algorithm interface loaded and were asked to take images of any lesions of concern. Patients were able to review and retake photos prior to submitting, so each submitted photo met the patient's assumed standard of clinical acceptability. A machine learning algorithm then gave the patient feedback on whether the image was acceptable. If the image was rejected, the patient was provided a reason by the AI decision support tool and allowed to retake the photos.

MAIN OUTCOMES AND MEASURES : The main outcome of the retrospective image analysis was the receiver operator curve area under the curve (ROC-AUC). The main outcome of the clinical pilot study was the image quality difference between the baseline images and the images approved by AI decision support.

RESULTS : Of the 98 patients included, the mean (SD) age was 49.8 (17.6) years, and 50 (51%) of the patients were male. On retrospective telemedicine images, the machine learning algorithm effectively identified poor-quality images (ROC-AUC of 0.78) and the reason for poor quality (blurry ROC-AUC of 0.84; lighting issues ROC-AUC of 0.70). The performance was consistent across age and sex. In the clinical pilot study, patient use of the machine learning algorithm was associated with improved image quality. An AI algorithm was associated with reduction in the number of patients with a poor-quality image by 68.0%.

CONCLUSIONS AND RELEVANCE : In this quality improvement study, patients use of the AI decision support with a machine learning algorithm was associated with improved quality of skin disease photographs submitted for telemedicine use.

Vodrahalli Kailas, Ko Justin, Chiou Albert S, Novoa Roberto, Abid Abubakar, Phung Michelle, Yekrang Kiana, Petrone Paige, Zou James, Daneshjou Roxana

2023-Mar-15