Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Precision medicine, AI, and the future of personalized health care.

In Clinical and translational science

The convergence of artificial intelligence (AI) and precision medicine promises to revolutionize health care. Precision medicine methods identify phenotypes of patients with less-common responses to treatment or unique health care needs. AI leverages sophisticated computation and inference to generate insights, enable the system to reason and learn, and empower clinician decision making through augmented intelligence. Recent literature suggests that translational research exploring this convergence will help solve the most difficult challenges facing precision medicine, especially those in which non-genomic and genomic determinants, combined with information from patient symptoms, clinical history, and lifestyles, will facilitate personalized diagnosis and prognostication.

Johnson Kevin B, Wei Wei-Qi, Weeraratne Dilhan, Frisse Mark E, Misulis Karl, Rhee Kyu, Zhao Juan, Snowdon Jane L

2020-Sep-22

Dermatology Dermatology

Combining Deep Learning With Optical Coherence Tomography Imaging to Determine Scalp Hair and Follicle Counts.

In Lasers in surgery and medicine

BACKGROUND AND OBJECTIVES : One of the challenges in developing effective hair loss therapies is the lack of reliable methods to monitor treatment response or alopecia progression. In this study, we propose the use of optical coherence tomography (OCT) and automated deep learning to non-invasively evaluate hair and follicle counts that may be used to monitor the success of hair growth therapy more accurately and efficiently.

STUDY DESIGN/MATERIALS AND METHODS : We collected 70 OCT scans from 14 patients with alopecia and trained a convolutional neural network (CNN) to automatically count all follicles present in the scans. The model is based on a dual approach of both detecting hair follicles and estimating the local hair density in order to give accurate counts even for cases where two or more adjacent hairs are in close proximity to each other.

RESULTS : We evaluate our system on 70 OCT manually labeled scans taken at different scalp locations from 14 patients, with 20 of those redundantly labeled by two human expert OCT operators. When comparing the individual human predictions and considering the exact locations of hair and follicle predictions, we find that the two human raters disagree with each other on approximately 22% of hairs and follicles. Overall, the deep learning (DL) system predicts the number of follicles with an error rate of 11.8% and the number of hairs with an error rate of 18.7% on average on the 70 scans. The OCT system can capture one scalp location in three seconds, and the DL model can make all predictions in less than a second after processing the scan, which takes half a minute using an unoptimized implementation.

CONCLUSION : This approach is well-positioned to become the standard for non-invasive evaluation of hair growth treatment progress in patients, saving significant amounts of time and effort compared with manual evaluation. Lasers Surg. Med. © 2020 Wiley Periodicals, Inc.

Urban Gregor, Feil Nate, Csuka Ella, Hashemi Kiana, Ekelem Chloe, Choi Franchesca, Mesinkovska Natasha A, Baldi Pierre

2020-Sep-22

OCT, alopecia, convolutional neural network, deep learning, hair loss, machine learning, optical coherence tomography

General General

Text mining for modeling of protein complexes enhanced by machine learning.

In Bioinformatics (Oxford, England)

MOTIVATION : Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins.

RESULTS : We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles.

AVAILABILITY : The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04.

SUPPLEMENTARY INFORMATION : Supplementary data are available at Bioinformatics online.

Badal Varsha D, Kundrotas Petras J, Vakser Ilya A

2020-Sep-22

General General

A novel sequence alignment algorithm based on deep learning of the protein folding code.

In Bioinformatics (Oxford, England)

MOTIVATION : From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the "twilight zone" of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent "d"). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures.

RESULTS : To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure α-helical proteins successfully recognizes pairs of structurally related pure β-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging data sets show significant improvement over established approaches. For challenging cases, SAdLSA is ∼150% better than HHsearch for generating pairwise alignments and ∼50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration.

AVAILABILITY : Data sets and source codes of SAdLSA are available free of charge for academic users at http://pwp.gatech.edu/cssb/sadlsa/.

SUPPLEMENTARY INFORMATION : Supplementary data are available at Bioinformatics online.

Gao Mu, Skolnick Jeffrey

2020-Sep-22

Cardiology Cardiology

A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity.

In PloS one ; h5-index 176.0

Worldwide, testing capacity for SARS-CoV-2 is limited and bottlenecks in the scale up of polymerase chain reaction (PCR-based testing exist. Our aim was to develop and evaluate a machine learning algorithm to diagnose COVID-19 in the inpatient setting. The algorithm was based on basic demographic and laboratory features to serve as a screening tool at hospitals where testing is scarce or unavailable. We used retrospectively collected data from the UCLA Health System in Los Angeles, California. We included all emergency room or inpatient cases receiving SARS-CoV-2 PCR testing who also had a set of ancillary laboratory features (n = 1,455) between 1 March 2020 and 24 May 2020. We tested seven machine learning models and used a combination of those models for the final diagnostic classification. In the test set (n = 392), our combined model had an area under the receiver operator curve of 0.91 (95% confidence interval 0.87-0.96). The model achieved a sensitivity of 0.93 (95% CI 0.85-0.98), specificity of 0.64 (95% CI 0.58-0.69). We found that our machine learning algorithm had excellent diagnostic metrics compared to SARS-CoV-2 PCR. This ensemble machine learning algorithm to diagnose COVID-19 has the potential to be used as a screening tool in hospital settings where PCR testing is scarce or unavailable.

Goodman-Meza David, Rudas Akos, Chiang Jeffrey N, Adamson Paul C, Ebinger Joseph, Sun Nancy, Botting Patrick, Fulcher Jennifer A, Saab Faysal G, Brook Rachel, Eskin Eleazar, An Ulzee, Kordi Misagh, Jew Brandon, Balliu Brunilda, Chen Zeyuan, Hill Brian L, Rahmani Elior, Halperin Eran, Manuel Vladimir

2020

General General

Alcoholic liver disease: A registry view on comorbidities and disease prediction.

In PLoS computational biology

Alcoholic-related liver disease (ALD) is the cause of more than half of all liver-related deaths. Sustained excess drinking causes fatty liver and alcohol-related steatohepatitis, which may progress to alcoholic liver fibrosis (ALF) and eventually to alcohol-related liver cirrhosis (ALC). Unfortunately, it is difficult to identify patients with early-stage ALD, as these are largely asymptomatic. Consequently, the majority of ALD patients are only diagnosed by the time ALD has reached decompensated cirrhosis, a symptomatic phase marked by the development of complications as bleeding and ascites. The main goal of this study is to discover relevant upstream diagnoses helping to understand the development of ALD, and to highlight meaningful downstream diagnoses that represent its progression to liver failure. Here, we use data from the Danish health registries covering the entire population of Denmark during nineteen years (1996-2014), to examine if it is possible to identify patients likely to develop ALF or ALC based on their past medical history. To this end, we explore a knowledge discovery approach by using high-dimensional statistical and machine learning techniques to extract and analyze data from the Danish National Patient Registry. Consistent with the late diagnoses of ALD, we find that ALC is the most common form of ALD in the registry data and that ALC patients have a strong over-representation of diagnoses associated with liver dysfunction. By contrast, we identify a small number of patients diagnosed with ALF who appear to be much less sick than those with ALC. We perform a matched case-control study using the group of patients with ALC as cases and their matched patients with non-ALD as controls. Machine learning models (SVM, RF, LightGBM and NaiveBayes) trained and tested on the set of ALC patients achieve a high performance for data classification (AUC = 0.89). When testing the same trained models on the small set of ALF patients, their performance unsurprisingly drops a lot (AUC = 0.67 for NaiveBayes). The statistical and machine learning results underscore small groups of upstream and downstream comorbidities that accurately detect ALC patients and show promise in prediction of ALF. Some of these groups are conditions either caused by alcohol or caused by malnutrition associated with alcohol-overuse. Others are comorbidities either related to trauma and life-style or to complications to cirrhosis, such as oesophageal varices. Our findings highlight the potential of this approach to uncover knowledge in registry data related to ALD.

Grissa Dhouha, Nytoft Rasmussen Ditlev, Krag Aleksander, Brunak Søren, Juhl Jensen Lars

2020-Sep-22