Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Nonhuman rationality: a predictive coding perspective.

In Cognitive processing

How can we rethink 'rationality' in the wake of animal and artificial intelligence studies? Can nonhuman systems be rational in any nontrivial sense? In this paper, we propose that all organisms, under certain circumstances, exhibit rationality to a diverse degree and aspect in the sense of the standard picture (SP): Their inferential processes conform to logic and probability rules. We first show that according to Calvo and Friston (J R Soc Interface 14(131):20170096, 2017) and Orlandi (2018), all biological systems must embody a top-down process (active inference) to minimize free energy. Next, based on Maddy's (Second philosophy, Oxford University Press, Oxford, 2007; The logical must: Wittgenstein on logic, Oxford University Press, Oxford, 2014) analysis, we argue that this inferential process conforms to logic and probability rules; thus, it satisfies the SP, which explains the rudimentary logic and arithmetic (e.g., categorizing and numbering) found among pigeons and mice. We also hold that the mammalian brain is only one among many ways of implementing rationality. Finally, we discuss data from microorganisms to support this view.

Hung Tzu-Wei


Active inference, Adaptation, Microorganism, Predictive coding, Rationality, Rudimentary logic and probability

Internal Medicine Internal Medicine

Shedding Light on the Black Box: Explaining Deep Neural Network Prediction of Clinical Outcomes.

In Journal of medical systems ; h5-index 48.0

Deep neural network models are emerging as an important method in healthcare delivery, following the recent success in other domains such as image recognition. Due to the multiple non-linear inner transformations, deep neural networks are viewed by many as black boxes. For practical use, deep learning models require explanations that are intuitive to clinicians. In this study, we developed a deep neural network model to predict outcomes following major cardiovascular procedures, using temporal image representation of past medical history as input. We created a novel explanation for the prediction of the model by defining impact scores that associate clinical observations with the outcome. For comparison, a logistic regression model was fitted to the same dataset. We compared the impact scores and log odds ratios by calculating three types of correlations, which provided a partial validation of the impact scores. The deep neural network model achieved an area under the receiver operating characteristics curve (AUC) of 0.787, compared to 0.746 for the logistic regression model. Moderate correlations were found between the impact scores and the log odds ratios. Impact scores generated by the explanation algorithm has the potential to shed light on the "black box" deep neural network model and could facilitate its adoption by clinicians.

Shao Yijun, Cheng Yan, Shah Rashmee U, Weir Charlene R, Bray Bruce E, Zeng-Treitler Qing


Clinical outcome, Deep neural network, Machine learning, Predictive model

General General

One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound.

In European radiology ; h5-index 62.0

OBJECTIVES : To investigate how a DL model makes decisions in lesion classification with a newly defined region of evidence (ROE) by incorporating "explainable AI" (xAI) techniques.

METHODS : A data set of 785 2D breast ultrasound images acquired from 367 females. The DenseNet-121 was used to classify whether the lesion is benign or malignant. For performance assessment, classification results are evaluated by calculating accuracy, sensitivity, specificity, and receiver operating characteristic for experiments of both coarse and fine regions of interest (ROIs). The area under the curve (AUC) was evaluated, and the true-positive, false-positive, true-negative, and false-negative results with breakdown in high, medium, and low resemblance on test sets were also reported.

RESULTS : The two models with coarse and fine ROIs of ultrasound images as input achieve an AUC of 0.899 and 0.869, respectively. The accuracy, sensitivity, and specificity of the model with coarse ROIs are 88.4%, 87.9%, and 89.2%, and with fine ROIs are 86.1%, 87.9%, and 83.8%, respectively. The DL model captures ROE with high resemblance of physicians' consideration as they assess the image.

CONCLUSIONS : We have demonstrated the effectiveness of using DenseNet to classify breast lesions with limited quantity of 2D grayscale ultrasound image data. We have also proposed a new ROE-based metric system that can help physicians and patients better understand how AI makes decisions in reading images, which can potentially be integrated as a part of evidence in early screening or triaging of patients undergoing breast ultrasound examinations.

KEY POINTS : • The two models with coarse and fine ROIs of ultrasound images as input achieve an AUC of 0.899 and 0.869, respectively. The accuracy, sensitivity, and specificity of the model with coarse ROIs are 88.4%, 87.9%, and 89.2%, and with fine ROIs are 86.1%, 87.9%, and 83.8%, respectively. • The first model with coarse ROIs is slightly better than the second model with fine ROIs according to these evaluation metrics. • The results from coarse ROI and fine ROI are consistent and the peripheral tissue is also an impact factor in breast lesion classification.

Dong Fajin, She Ruilian, Cui Chen, Shi Siyuan, Hu Xuqiao, Zeng Jieying, Wu Huaiyu, Xu Jinfeng, Zhang Yun


Artificial intelligence, Breast neoplasms, Ultrasonography

General General

Radiomic analysis of HTR-DCE MR sequences improves diagnostic performance compared to BI-RADS analysis of breast MR lesions.

In European radiology ; h5-index 62.0

PURPOSE : To assess the diagnostic performance of radiomic analysis using high temporal resolution (HTR)-dynamic contrast enhancement (DCE) MR sequences compared to BI-RADS analysis to distinguish benign from malignant breast lesions.

MATERIALS AND METHODS : We retrospectively analyzed data from consecutive women who underwent breast MRI including HTR-DCE MR sequencing for abnormal enhancing lesions and who had subsequent pathological analysis at our tertiary center. Semi-quantitative enhancement parameters and textural features were extracted. Temporal change across each phase of textural features in HTR-DCE MR sequences was calculated and called "kinetic textural parameters." Statistical analysis by LASSO logistic regression and cross validation was performed to build a model. The diagnostic performance of the radiomic model was compared to the results of BI-RADS MR score analysis.

RESULTS : We included 117 women with a mean age of 54 years (28-88). Of the 174 lesions analyzed, 75 were benign and 99 malignant. Seven semi-quantitative enhancement parameters and 57 textural features were extracted. Regression analysis selected 15 significant variables in a radiomic model (called "malignant probability score") which displayed an AUC = 0.876 (sensitivity = 0.98, specificity = 0.52, accuracy = 0.78). The performance of the malignant probability score to distinguish benign from malignant breast lesions (AUC = 0.876, 95%CI 0.825-0.925) was significantly better than that of BI-RADS analysis (AUC = 0.831, 95%CI 0.769-0.892). The radiomic model significantly reduced false positives (42%) with the same number of missed cancers (n = 2).

CONCLUSION : A radiomic model including kinetic textural features extracted from an HTR-DCE MR sequence improves diagnostic performance over BI-RADS analysis.

KEY POINTS : • Radiomic analysis using HTR-DCE is of better diagnostic performance (AUC = 0.876) than conventional breast MRI reading with BI-RADS (AUC = 0.831) (p < 0.001). • A radiomic malignant probability score under 19.5% gives a negative predictive value of 100% while a malignant probability score over 81% gives a positive predictive value of 100%. • Kinetic textural features extracted from HTR-DCE-MRI have a major role to play in distinguishing benign from malignant breast lesions.

Perre Saskia Vande, Duron Loïc, Milon Audrey, Bekhouche Asma, Balvay Daniel, Cornelis Francois H, Fournier Laure, Thomassin-Naggara Isabelle


Artificial intelligence, Breast, MRI image enhancement, Neoplasms

Cardiology Cardiology

Effect of Machine Learning on Dispatcher Recognition of Out-of-Hospital Cardiac Arrest During Calls to Emergency Medical Services: A Randomized Clinical Trial.

In JAMA network open

Importance : Emergency medical dispatchers fail to identify approximately 25% of cases of out-of-hospital cardiac arrest (OHCA), resulting in lost opportunities to save lives by initiating cardiopulmonary resuscitation.

Objective : To examine how a machine learning model trained to identify OHCA and alert dispatchers during emergency calls affected OHCA recognition and response.

Design, Setting, and Participants : This double-masked, 2-group, randomized clinical trial analyzed all calls to emergency number 112 (equivalent to 911) in Denmark. Calls were processed by a machine learning model using speech recognition software. The machine learning model assessed ongoing calls, and calls in which the model identified OHCA were randomized. The trial was performed at Copenhagen Emergency Medical Services, Denmark, between September 1, 2018, and December 31, 2019.

Intervention : Dispatchers in the intervention group were alerted when the machine learning model identified out-of-hospital cardiac arrest, and those in the control group followed normal protocols without alert.

Main Outcomes and Measures : The primary end point was the rate of dispatcher recognition of subsequently confirmed OHCA.

Results : A total of 169 049 emergency calls were examined, of which the machine learning model identified 5242 as suspected OHCA. Calls were randomized to control (2661 [50.8%]) or intervention (2581 [49.2%]) groups. Of these, 336 (12.6%) and 318 (12.3%), respectively, had confirmed OHCA. The mean (SD) age among of these 654 patients was 70 (16.1) years, and 419 of 627 patients (67.8%) with known gender were men. Dispatchers in the intervention group recognized 296 confirmed OHCA cases (93.1%) with machine learning assistance compared with 304 confirmed OHCA cases (90.5%) using standard protocols without machine learning assistance (P = .15). Machine learning alerts alone had a significantly higher sensitivity than dispatchers without alerts for confirmed OHCA (85.0% vs 77.5%; P < .001) but lower specificity (97.4% vs 99.6%; P < .001) and positive predictive value (17.8% vs 55.8%; P < .001).

Conclusions and Relevance : This randomized clinical trial did not find any significant improvement in dispatchers' ability to recognize cardiac arrest when supported by machine learning even though artificial intelligence did surpass human recognition.

Trial Registration : Identifier: NCT04219306.

Blomberg Stig Nikolaj, Christensen Helle Collatz, Lippert Freddy, Ersbøll Annette Kjær, Torp-Petersen Christian, Sayre Michael R, Kudenchuk Peter J, Folke Fredrik


General General

Identification of Suicide Attempt Risk Factors in a National US Survey Using Machine Learning.

In JAMA psychiatry ; h5-index 106.0

Importance : Because more than one-third of people making nonfatal suicide attempts do not receive mental health treatment, it is essential to extend suicide attempt risk factors beyond high-risk clinical populations to the general adult population.

Objective : To identify future suicide attempt risk factors in the general population using a data-driven machine learning approach including more than 2500 questions from a large, nationally representative survey of US adults.

Design, Setting, and Participants : Data came from wave 1 (2001 to 2002) and wave 2 (2004 to 2005) of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). NESARC is a face-to-face longitudinal survey conducted with a national representative sample of noninstitutionalized civilian population 18 years and older in the US. The cumulative response rate across both waves was 70.2% resulting in 34 653 wave 2 interviews. A balanced random forest was trained using cross-validation to develop a suicide attempt risk model. Out-of-fold model prediction was used to assess model performance, including the area under the receiver operator curve, sensitivity, and specificity. Survey design and nonresponse weights allowed estimates to be representative of the US civilian population based on the 2000 census. Analyses were performed between May 15, 2019, and June 10, 2020.

Main Outcomes and Measures : Attempted suicide in the 3 years between wave 1 and wave 2 interviews.

Results : Of 34 653 participants, 20 089 were female (weighted proportion, 52.1%). The weighted mean (SD) age was 45.1 (17.3) years at wave 1 and 48.2 (17.3) years at wave 2. Attempted suicide during the 3 years between wave 1 and wave 2 interviews was self-reported by 222 of 34 653 participants (0.6%). Using survey questions measured at wave 1, the suicide attempt risk model yielded a cross-validated area under the receiver operator characteristic curve of 0.857 with a sensitivity of 85.3% (95% CI, 79.8-89.7) and a specificity of 73.3% (95% CI, 72.8-73.8) at an optimized threshold. The model identified 1.8% of the US population to be at a 10% or greater risk of suicide attempt. The most important risk factors were 3 questions about previous suicidal ideation or behavior; 3 items from the 12-Item Short Form Health Survey, namely feeling downhearted, doing activities less carefully, or accomplishing less because of emotional problems; younger age; lower educational achievement; and recent financial crisis.

Conclusions and Relevance : In this study, after searching through more than 2500 survey questions, several well-known risk factors of suicide attempt were confirmed, such as previous suicidal behaviors and ideation, and new risks were identified, including functional impairment resulting from mental disorders and socioeconomic disadvantage. These results may help guide future clinical assessment and the development of new suicide risk scales.

García de la Garza Ángel, Blanco Carlos, Olfson Mark, Wall Melanie M