Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Predicting the availability of haematopoietic stem cell donors using machine learning.

In Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation

Haematopoietic stem cell transplantation (HSCT) is firmly established as an important curative therapy for patients with hematologic malignancies and other blood disorders. Apart from finding human leukocyte antigen (HLA) matched donors during the HSCT process, donor availability remains a key consideration as the time taken from diagnosis to transplant is recognised to adversely affect patient outcome. In this study, we aimed to develop and validate a machine learning approach to predict the availability of stem cell donors. We retrospectively collected a dataset containing 10,258 verification typing (VT) requests made during the HSCT process in the British Bone Marrow Registry (BBMR) between 1st January 2013 and 31st December 2018. Three machine learning algorithms were implemented and compared, including boosted decision trees (BDT), logistic regression (LR) and support vector machines (SVM). Area under the receiver operating characteristic curve (AUC) was primarily used to assess the algorithms. The experimental results showed that BDT performed better in predicting the availability of BBMR donors. The overall predictive power of the model, using AUC on the test cohort of 2052 records, was found to be 0.826. Our findings show that machine learning can predict the availability of donors with a high degree of accuracy. We propose the use of BDT machine learning approach to predict the availability of BBMR donors and use the predictive scores during the HSCT process, to ensure patients with blood cancers or disorders receive a transplant at the optimum time.

Li Ying, Masiliune Ausra, Winstone David, Gasieniec Leszek, Wong Prudence, Lin Hong, Pawson Rachel, Parkes Guy, Hadley Andrew


Pathology Pathology

Genotype-phenotype analysis of LMNA-related diseases predicts phenotype-selective alterations in lamin phosphorylation.

In FASEB journal : official publication of the Federation of American Societies for Experimental Biology

Laminopathies are rare diseases associated with mutations in LMNA, which encodes nuclear lamin A/C. LMNA variants lead to diverse tissue-specific phenotypes including cardiomyopathy, lipodystrophy, myopathy, neuropathy, progeria, bone/skin disorders, and overlap syndromes. The mechanisms underlying these heterogeneous phenotypes remain poorly understood, although post-translational modifications, including phosphorylation, are postulated as regulators of lamin function. We catalogued all known lamin A/C human mutations and their associated phenotypes, and systematically examined the putative role of phosphorylation in laminopathies. In silico prediction of specific LMNA mutant-driven changes to lamin A phosphorylation and protein structure was performed using machine learning methods. Some of the predictions we generated were validated via assessment of ectopically expressed wild-type and mutant LMNA. Our findings indicate phenotype- and mutant-specific alterations in lamin phosphorylation, and that some changes in phosphorylation may occur independently of predicted changes in lamin protein structure. Therefore, therapeutic targeting of phosphorylation in the context of laminopathies will likely require mutant- and kinase-specific approaches.

Lin Eric W, Brady Graham F, Kwan Raymond, Nesvizhskii Alexey I, Omary M Bishr


intermediate filaments, laminopathy, mutation, post-translational modifications

Public Health Public Health

Impact of ICD10 and secular changes on electronic medical record rheumatoid arthritis algorithms.

In Rheumatology (Oxford, England)

OBJECTIVE : The objective of this study was to compare the performance of an RA algorithm developed and trained in 2010 utilizing natural language processing and machine learning, using updated data containing ICD10, new RA treatments, and a new electronic medical records (EMR) system.

METHODS : We extracted data from subjects with ≥1 RA International Classification of Diseases (ICD) codes from the EMR of two large academic centres to create a data mart. Gold standard RA cases were identified from reviewing a random 200 subjects from the data mart, and a random 100 subjects who only have RA ICD10 codes. We compared the performance of the following algorithms using the original 2010 data with updated data: (i) a published 2010 RA algorithm; (ii) updated algorithm, incorporating ICD10 RA codes and new DMARDs; and (iii) published algorithm using ICD codes only, ICD RA code ≥3.

RESULTS : The gold standard RA cases had mean age 65.5 years, 78.7% female, 74.1% RF or antibodies to cyclic citrullinated peptide (anti-CCP) positive. The positive predictive value (PPV) for ≥3 RA ICD was 54%, compared with 56% in 2010. At a specificity of 95%, the PPV of the 2010 algorithm and the updated version were both 91%, compared with 94% (95% CI: 91, 96%) in 2010. In subjects with ICD10 data only, the PPV for the updated 2010 RA algorithm was 93%.

CONCLUSION : The 2010 RA algorithm validated with the updated data with similar performance characteristics as the 2010 data. While the 2010 algorithm continued to perform better than the rule-based approach, the PPV of the latter also remained stable over time.

Huang Sicong, Huang Jie, Cai Tianrun, Dahal Kumar P, Cagan Andrew, He Zeling, Stratton Jacklyn, Gorelik Isaac, Hong Chuan, Cai Tianxi, Liao Katherine P


electronic medical record, machine learning, rheumatoid arthritis

General General

Characterizing Individual Differences in a Dynamic Stabilization Task Using Machine Learning.

In Aerospace medicine and human performance

INTRODUCTION: Being able to identify individual differences in skilled motor learning during disorienting conditions is important for spaceflight, military aviation, and rehabilitation.METHODS: Blindfolded subjects (N = 34) were strapped into a device that behaved like an inverted pendulum in the horizontal roll plane and were instructed to use a joystick to stabilize themselves across two experimental sessions on consecutive days. Subjects could not use gravitational cues to determine their angular position and many soon became spatially disoriented.RESULTS: Most demonstrated minimal learning, poor performance, and a characteristic pattern of positional drifting during horizontal roll plane balancing. To understand the wide range of individual differences observed, we used a Bayesian Gaussian Mixture method to cluster subjects into three statistically distinct groups that represent Proficient, Somewhat Proficient, and Not Proficient performance. We found that subjects in the Not Proficient group exhibited a suboptimal strategy of using very stereotyped large magnitude joystick deflections. We also used a Gaussian Naive Bayes method to create predictive classifiers. As early as the second block of experimentation (out of ten), we could predict a subject's final group with 80% accuracy.DISCUSSION: Our findings indicate that machine learning can help predict individual performance and learning in a disorienting dynamic stabilization task and identify suboptimal strategies in Not Proficient subjects, which could lead to personalized and more effective training programs.Vimal VP, Zheng H, Hong P, Fakharzadeh LN, Lackner JR, DiZio P. Characterizing individual differences in a dynamic stabilization task using machine learning. Aerosp Med Hum Perform. 2020; 91(6):479-488.

Vimal Vivekanand Pandey, Zheng Han, Hong Pengyu, Fakharzadeh Lila N, Lackner James R, DiZio Paul


General General

Parasitologist-level classification of apicomplexan parasites and host cell with deep cycle transfer learning (DCTL).

In Bioinformatics (Oxford, England)

MOTIVATION : Apicomplexan parasites, including Toxoplasma, Plasmodium and Babesia, are important pathogens that affect billions of humans and animals worldwide. Usually a microscope is used to detect these parasites, but it is difficult to use microscopes and clinician requires to be trained. Finding a cost-effective solution to detect these parasites is of particular interest in developing countries, in which infection is more common.

RESULTS : Here we propose an alternative method, deep cycle transfer learning (DCTL), to detect Apicomplexan parasites, by utilizing deep learning-based microscopic image analysis. DCTL is based on observations of parasitologists that Toxoplasma is banana-shaped, Plasmodium is generally ring-shaped, and Babesia is typically pear-shaped. Our approach aims to connect those microscopic objects (Toxoplasma, Plasmodium, Babesia and erythrocyte) with their morphological similar macro ones (banana, ring, pear and apple) through a cycle transfer of knowledge. In the experiments, we conduct DCTL on 24,358 microscopic images of parasites. Results demonstrate high accuracy and effectiveness of DCTL, with an average accuracy of 95.7% and an area under the curve (AUC) of 0.995 for all parasites types. This paper is the first work to apply knowledge from parasitologists to Apicomplexan parasite recognition, and it opens new ground for developing AI-powered microscopy image diagnostic systems.

AVAILABILITY AND IMPLEMENTATION : Code and dataset available at

CONTACT AND SUPPLEMENTARY INFORMATION : Email: Supplementary data are available at Bioinformatics online.

Li Sen, Yang Qi, Jiang Hao, Cortés-Vecino Jesús A, Zhang Yang


Babesia, Deep learning, Knowledge transfer, Microscopic images analysis, Morphology, Plasmodium, Toxoplasma

Internal Medicine Internal Medicine

OtoMatch: Content-based eardrum image retrieval using deep learning.

In PloS one ; h5-index 176.0

Acute infections of the middle ear are the most commonly treated childhood diseases. Because complications affect children's language learning and cognitive processes, it is essential to diagnose these diseases in a timely and accurate manner. The prevailing literature suggests that it is difficult to accurately diagnose these infections, even for experienced ear, nose, and throat (ENT) physicians. Advanced care practitioners (e.g., nurse practitioners, physician assistants) serve as first-line providers in many primary care settings and may benefit from additional guidance to appropriately determine the diagnosis and treatment of ear diseases. For this purpose, we designed a content-based image retrieval (CBIR) system (called OtoMatch) for normal, middle ear effusion, and tympanostomy tube conditions, operating on eardrum images captured with a digital otoscope. We present a method that enables the conversion of any convolutional neural network (trained for classification) into an image retrieval model. As a proof of concept, we converted a pre-trained deep learning model into an image retrieval system. We accomplished this by changing the fully connected layers into lookup tables. A database of 454 labeled eardrum images (179 normal, 179 effusion, and 96 tube cases) was used to train and test the system. On a 10-fold cross validation, the proposed method resulted in an average accuracy of 80.58% (SD 5.37%), and maximum F1 score of 0.90 while retrieving the most similar image from the database. These are promising results for the first study to demonstrate the feasibility of developing a CBIR system for eardrum images using the newly proposed methodology.

Camalan Seda, Niazi Muhammad Khalid Khan, Moberly Aaron C, Teknos Theodoros, Essig Garth, Elmaraghy Charles, Taj-Schaal Nazhat, Gurcan Metin N