Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Public Health Public Health

Predicting neonatal respiratory distress syndrome and hypoglycaemia prior to discharge: Leveraging health administrative data and machine learning.

In Journal of biomedical informatics ; h5-index 55.0

OBJECTIVES : A major challenge for hospitals and clinicians is the early identification of neonates at risk of developing adverse conditions. We develop a model based on routinely collected administrative data, which accurately predicts two common disorders among early term and preterm (<39 weeks) neonates prior to discharge. Study design The data included all inpatient live births born prior to 39 weeks (n = 154,755) occurring in the Australian state of Queensland between January 2009 and December 2015. Predictor variables included all maternal data captured in administrative records from the beginning of gestation up to, and including, the delivery, as well as neonatal data recorded at the delivery. Gradient boosted trees were used to predict neonatal respiratory distress syndrome and hypoglycaemia prior to discharge, with model performance benchmarked against a logistic regression models.

RESULTS : The gradient boosted trees model achieved very high discrimination for respiratory distress syndrome [AUC = 0.923, 95% CI (0.917, 0.928)] and good discrimination for hypoglycaemia [AUC = 0.832, 95% CI (0.827, 0.837)] in the validation data, as well as outperforming the logistic regression models.

CONCLUSION : Our study suggests that routinely collected health data have the potential to play an important role in assisting clinicians to identify neonates at risk of developing selected disorders shortly after birth. Despite achieving high levels of discrimination, many issues remain before such models can be implemented in practice, which we discuss in relation to our findings.

Betts Kim S, Kisely Steve, Alati Rosa


Administrative data linkage, machine learning, neonatal outcomes, predictive models

General General

Identification of specific neural circuit underlying the key cognitive deficit of remitted late-onset depression: A multi-modal MRI and machine learning study.

In Progress in neuro-psychopharmacology & biological psychiatry

Neuropsychological impairment is a key feature of late-onset depression (LOD), with deficits observed across multiple cognitive domains. And this neuropsychological impairment can persist even after the remission of depressive symptoms. However, none of previous studies have explored the pattern of cognitive deficit in remitted LOD (rLOD), and investigated the specific neural circuit underlying the key cognitive deficit of LOD. 40 rLOD patients and 36 controls underwent comprehensive neuropsychological assessments and magnetic resonance imaging (MRI) scans. The influence of executive function or information processing speed deficit on other cognitive domains was first investigated. We then applied a multivariate machine learning technique known as relevance vector regression to evaluate the potential of multiple-modal MRI (i.e., integrating whole-brain grey-matter [GM] volume and white-matter [WM] tract features) for making accurate predictions about the key cognitive deficit for individual rLOD patient. We revealed that the information processing speed appears to represent a key cognitive deficit in rLOD. Further the machine learning model identified a wide range of GM regions and WM tracts that significantly contributed to the prediction of individual performance on information processing speed (r = 0.50, P < 0.001). The GM regions mainly located in the frontal-subcortical and limbic systems; and the WM tracts mainly located in the frontal-limbic pathway, including the anterior corona radiata, fornix, posterior cingulate bundle, and uncinate fasciculus. This present study provide strongly evidence supporting the concept of rLOD that the core aspect of the cognitive deficits (i.e., information processing speed) is associated with disruption of the frontal-subcortical-limbic pathway.

Wang Zan, Yuan Yonggui, Jiang Ying, You Jiayong, Zhang Zhijun


Cognitive deficit, Late-onset depression, Machine learning, Magnetic resonance imaging, Relevance vector regression

General General

Multi-Assignment Clustering: Machine learning from a biological perspective.

In Journal of biotechnology

A common approach for analyzing large-scale molecular data is to cluster objects sharing similar characteristics. This assumes that genes with highly similar expression profiles are likely participating in a common molecular process. Biological systems are extremely complex and challenging to understand, with proteins having multiple functions that sometimes need to be activated or expressed in a time-dependent manner. Thus, the strategies applied for clustering of these molecules into groups are of key importance for translation of data to biologically interpretable findings. Here we implemented a multi-assignment clustering (MAsC) approach that allows molecules to be assigned to multiple clusters, rather than single ones as in commonly used clustering techniques. When applied to high-throughput transcriptomics data, MAsC increased power of the downstream pathway analysis and allowed identification of pathways with high biological relevance to the experimental setting and the biological systems studied. Multi-assignment clustering also reduced noise in the clustering partition by excluding genes with a low correlation to all of the resulting clusters. Together, these findings suggest that our methodology facilitates translation of large-scale molecular data into biological knowledge. The method is made available as an R package on GitLab (

Ulfenborg Benjamin, Karlsson Alexander, Riveiro Maria, Andersson Christian X, Sartipy Peter, Synnergren Jane


Clustering, K-means, annotation enrichment, multiple cluster assignment, pathways, transcriptomics

Pathology Pathology

Exploring dyserythropoiesis in patients with myelodysplastic syndrome by imaging flow cytometry and machine-learning assisted morphometrics.

In Cytometry. Part B, Clinical cytometry

BACKGROUND : The hallmark of myelodysplastic syndrome (MDS) remains dysplasia in the bone marrow (BM). However, diagnosing MDS may be challenging and subject to inter-observer variability. Thus, there is an unmet need for novel objective, standardized and reproducible methods for evaluating dysplasia. Imaging flow cytometry (IFC) offers combined analyses of phenotypic and image-based morphometric parameters, for example, cell size and nuclearity. Hence, we hypothesized IFC to be a useful tool in MDS diagnostics.

METHODS : Using a different-from-normal approach, we investigated dyserythropoiesis by quantifying morphometric features in a median of 5953 erythroblasts (range: 489-68,503) from 14 MDS patients, 11 healthy donors, 6 non-MDS controls with increased erythropoiesis, and 6 patients with cytopenia.

RESULTS : First, we morphometrically confirmed normal erythroid maturation, as immunophenotypically defined erythroid precursors could be sequenced by significantly decreasing cell-, nuclear- and cytoplasm area. In MDS samples, we demonstrated cell size enlargement and increased fractions of macronormoblasts in late-stage erythroblasts (both p < .0001). Interestingly, cytopenic controls with high-risk mutational patterns displayed highly aberrant cell size morphometrics. Furthermore, assisted by machine learning algorithms, we reliably identified and enumerated true binucleated erythroblasts at a significantly higher frequency in two out of three erythroblast maturation stages in MDS patients compared to normal BM (both p = .0001).

CONCLUSION : We demonstrate proof-of-concept results of the applicability of automated IFC-based techniques to study and quantify morphometric changes in dyserythropoietic BM cells. We propose that IFC holds great promise as a powerful and objective tool in the complex setting of MDS diagnostics with the potential for minimizing inter-observer variability.

Rosenberg Carina A, Bill Marie, Rodrigues Matthew A, Hauerslev Mathias, Kerndrup Gitte B, Hokland Peter, Ludvigsen Maja


dyserythropoiesis, high-throughput morphometric quantification, imaging flow cytometry, myelodysplastic syndrome

Radiology Radiology

Deep Learning-assisted MRI Prediction of Tumor Response to Chemotherapy in Patients with Colorectal Liver Metastases.

In International journal of cancer ; h5-index 82.0

Accurate evaluation of tumor response to preoperative chemotherapy is crucial for assigning appropriate patients with colorectal liver metastases (CRLM) to surgery or conservative therapy. However, there is no well-recognized method for predicting pathological response before surgery. This study constructed and validated a deep learning algorithm using pre- and post-chemotherapy magnetic resonance imaging (MRI) to predict pathological response in CRLM. CRLM patients from center one who had ≤ 5 lesions and were scheduled to receive preoperative chemotherapy followed by liver resection between January 2013 and November 2016, were included prospectively and chronologically divided into a training cohort (80% of patients) and a testing cohort (20% of patients). Patients from center two were included January 2017 and December 2018 as an external validation cohort. MRI-based models were constructed to discriminate according to pathology tumor regression grade (TRG) between the response (TRG1/2) and non-response (TRG3/4/5) groups at the lesion level. From center one, 155 patients (328 lesions) were included; chronologically, 101 (264 lesions) in the training cohort and 54 (64 lesions) in the testing cohort. The model achieved better accuracy (0.875 vs. 0.578) and AUC (0.849 vs. 0.615) than RECIST for discriminating response; it also distinguished the survival outcomes after hepatectomy better than the RECIST criteria. Evaluations of the external validation cohort (25 patients, 61 lesions) also showed good ability with an AUC of 0.833. In conclusion, the MRI-based deep learning model provided accurate prediction of pathological tumor response to preoperative chemotherapy in patients with CRLM and may inform individualized treatment. This article is protected by copyright. All rights reserved.

Zhu Hai-Bin, Xu Da, Ye Meng, Sun Li, Zhang Xiao-Yan, Li Xiao-Ting, Nie Pei, Xing Bao-Cai, Sun Ying-Shi


colorectal liver metastases, deep learning, magnetic resonance imaging, tumor regression grade

Surgery Surgery

The future surgical training paradigm: Virtual reality and machine learning in surgical education.

In Surgery ; h5-index 54.0

Surgical training has undergone substantial change in the last few decades. As technology and patient complexity continues to increase, demands for novel approaches to ensure competency have arisen. Virtual reality systems augmented with machine learning represents one such approach. The ability to offer on-demand training, integrate checklists, and provide personalized, surgeon-specific feedback is paving the way to a new era of surgical training. Machine learning algorithms that improve over time as they acquire more data will continue to refine the education they provide. Further, fully immersive simulated environments coupled with machine learning analytics provide real-world training opportunities in a safe atmosphere away from the potential to harm patients. Careful implementation of these technologies has the potential to increase access and improve quality of surgical training and patient care and are poised to change the landscape of current surgical training. Herein, we describe the current state of virtual reality coupled with machine learning for surgical training, future directions, and existing limitations of this technology.

Rogers Michael P, DeSantis Anthony J, Janjua Haroon, Barry Tara M, Kuo Paul C