Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Machine learning-based prediction of COVID-19 diagnosis based on symptoms.

In NPJ digital medicine

Effective screening of SARS-CoV-2 enables quick and efficient diagnosis of COVID-19 and can mitigate the burden on healthcare systems. Prediction models that combine several features to estimate the risk of infection have been developed. These aim to assist medical staff worldwide in triaging patients, especially in the context of limited healthcare resources. We established a machine-learning approach that trained on records from 51,831 tested individuals (of whom 4769 were confirmed to have COVID-19). The test set contained data from the subsequent week (47,401 tested individuals of whom 3624 were confirmed to have COVID-19). Our model predicted COVID-19 test results with high accuracy using only eight binary features: sex, age ≥60 years, known contact with an infected individual, and the appearance of five initial clinical symptoms. Overall, based on the nationwide data publicly reported by the Israeli Ministry of Health, we developed a model that detects COVID-19 cases by simple features accessed by asking basic questions. Our framework can be used, among other considerations, to prioritize testing for COVID-19 when testing resources are limited.

Zoabi Yazeed, Deri-Rozov Shira, Shomron Noam


General General

Terahertz pulse shaping using diffractive surfaces.

In Nature communications ; h5-index 260.0

Recent advances in deep learning have been providing non-intuitive solutions to various inverse problems in optics. At the intersection of machine learning and optics, diffractive networks merge wave-optics with deep learning to design task-specific elements to all-optically perform various tasks such as object classification and machine vision. Here, we present a diffractive network, which is used to shape an arbitrary broadband pulse into a desired optical waveform, forming a compact and passive pulse engineering system. We demonstrate the synthesis of various different pulses by designing diffractive layers that collectively engineer the temporal waveform of an input terahertz pulse. Our results demonstrate direct pulse shaping in terahertz spectrum, where the amplitude and phase of the input wavelengths are independently controlled through a passive diffractive device, without the need for an external pump. Furthermore, a physical transfer learning approach is presented to illustrate pulse-width tunability by replacing part of an existing network with newly trained diffractive layers, demonstrating its modularity. This learning-based diffractive pulse engineering framework can find broad applications in e.g., communications, ultra-fast imaging and spectroscopy.

Veli Muhammed, Mengu Deniz, Yardimci Nezih T, Luo Yi, Li Jingxi, Rivenson Yair, Jarrahi Mona, Ozcan Aydogan


General General

Prediction of Alzheimer's disease-specific phospholipase c gamma-1 SNV by deep learning-based approach for high-throughput screening.

In Proceedings of the National Academy of Sciences of the United States of America

Exon splicing triggered by unpredicted genetic mutation can cause translational variations in neurodegenerative disorders. In this study, we discover Alzheimer's disease (AD)-specific single-nucleotide variants (SNVs) and abnormal exon splicing of phospholipase c gamma-1 (PLCγ1) gene, using genome-wide association study (GWAS) and a deep learning-based exon splicing prediction tool. GWAS revealed that the identified single-nucleotide variations were mainly distributed in the H3K27ac-enriched region of PLCγ1 gene body during brain development in an AD mouse model. A deep learning analysis, trained with human genome sequences, predicted 14 splicing sites in human PLCγ1 gene, and one of these completely matched with an SNV in exon 27 of PLCγ1 gene in an AD mouse model. In particular, the SNV in exon 27 of PLCγ1 gene is associated with abnormal splicing during messenger RNA maturation. Taken together, our findings suggest that this approach, which combines in silico and deep learning-based analyses, has potential for identifying the clinical utility of critical SNVs in AD prediction.

Kim Sung-Hyun, Yang Sumin, Lim Key-Hwan, Ko Euiseng, Jang Hyun-Jun, Kang Mingon, Suh Pann-Ghill, Joo Jae-Yeol


Alzheimer’s disease, PLCγ1, deep learning, single-nucleotide variation

Public Health Public Health

Predicting risk of early discontinuation of exclusive breastfeeding at a Brazilian referral hospital for high-risk neonates and infants: a decision-tree analysis.

In International breastfeeding journal

BACKGROUND : Determinants at several levels may affect breastfeeding practices. Besides the known historical, socio-economic, cultural, and individual factors, other components also pose major challenges to breastfeeding. Predicting existing patterns and identifying modifiable components are important for achieving optimal results as early as possible, especially in the most vulnerable population. The goal of this study was building a tree-based analysis to determine the variables that can predict the pattern of breastfeeding at hospital discharge and at 3 and 6 months of age in a referral center for high-risk infants.

METHODS : This prospective, longitudinal study included 1003 infants and was conducted at a high-risk public hospital in the following three phases: hospital admission, first visit after discharge, and monthly telephone interview until the sixth month of the infant's life. Independent variables were sorted into four groups: factors related to the newborn infant, mother, health service, and breastfeeding. The outcome was breastfeeding as per the categories established by the World Health Organization (WHO). For this study, we performed an exploratory analysis at hospital discharge and at 3 and at 6 months of age in two stages, as follows: (i) determining the frequencies of baseline characteristics stratified by breastfeeding indicators in the three mentioned periods and (ii) decision-tree analysis.

RESULTS : The prevalence of exclusive breastfeeding (EBF) was 65.2% at hospital discharge, 51% at 3 months, and 20.6% at 6 months. At hospital discharge and the sixth month, the length of hospital stay was the most important predictor of feeding practices, also relevant at the third month. Besides the mother's and child's characteristics (multiple births, maternal age, and parity), the social context, work, feeding practice during hospitalization, and hospital practices and policies on breastfeeding influenced the breastfeeding rates.

CONCLUSIONS : The combination algorithm of decision trees (a machine learning technique) provides a better understanding of the risk predictors of breastfeeding cessation in a setting with a large variability in expositions. Decision trees may provide a basis for recommendations aimed at this high-risk population, within the Brazilian context, in light of the hospital stay at a neonatal unit and period of continuous feeding practice.

Silva Maíra Domingues Bernardes, de Oliveira Raquel de Vasconcellos Carvalhaes, da Silveira Barroso Alves Davi, Melo Enirtes Caetano Prates


General General

Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge.

In Biology direct

BACKGROUND : Composition of microbial communities can be location-specific, and the different abundance of taxon within location could help us to unravel city-specific signature and predict the sample origin locations accurately. In this study, the whole genome shotgun (WGS) metagenomics data from samples across 16 cities around the world and samples from another 8 cities were provided as the main and mystery datasets respectively as the part of the CAMDA 2019 MetaSUB "Forensic Challenge". The feature selecting, normalization, three methods of machine learning, PCoA (Principal Coordinates Analysis) and ANCOM (Analysis of composition of microbiomes) were conducted for both the main and mystery datasets.

RESULTS : Features selecting, combined with the machines learning methods, revealed that the combination of the common features was effective for predicting the origin of the samples. The average error rates of 11.93 and 30.37% of three machine learning methods were obtained for main and mystery datasets respectively. Using the samples from main dataset to predict the labels of samples from mystery dataset, nearly 89.98% of the test samples could be correctly labeled as "mystery" samples. PCoA showed that nearly 60% of the total variability of the data could be explained by the first two PCoA axes. Although many cities overlapped, the separation of some cities was found in PCoA. The results of ANCOM, combined with importance score from the Random Forest, indicated that the common "family", "order" of the main-dataset and the common "order" of the mystery dataset provided the most efficient information for prediction respectively.

CONCLUSIONS : The results of the classification suggested that the composition of the microbiomes was distinctive across the cities, which could be used to identify the sample origins. This was also supported by the results from ANCOM and importance score from the RF. In addition, the accuracy of the prediction could be improved by more samples and better sequencing depth.

Zhang Runzhi, Walker Alejandro R, Datta Susmita


ANCOM, Linear discriminant analysis, Machine learning, Microbiome, OTU, PCoA, Random Forest, Support vector machine, WGS

General General

Artificial Intelligence, Big data and Machine Learning approaches in Precision Medicine & Drug Discovery

In Current drug targets

Artificial Intelligence revolutionizes the drug development process that can quickly identify potential biologically active compounds from millions of candidate within a short span of time. The present review is an overview based on some applications of Machine Learning based tools such as GOLD, DeepPVP, LIBSVM, etc and the algorithms involved such as support vector machine (SVM), random forest (RF), decision trees and artificial neural networks (ANN) etc in the various stages of drug designing and development. These techniques can be employed in SNP discoveries, drug repurposing, ligand-based drug design (LBDD), Ligand-based Virtual Screening (LBVS) and Structure-based virtual screening (SBVS), Lead identification, quantitative structure-activity relationship (QSAR) modeling, and ADMET analysis. It is demonstrated that SVM exhibited better performance in indicating that the classification model will have great applications on human intesti-nal absorption (HIA) predictions. Successful cases have been reported which demonstrate the efficiency of SVM and RF model in identifying JFD00950 as a novel compound targeting against a colon cancer cell line, DLD-1 by inhibition of FEN1 cytotoxic and cleavage activity. Furthermore, a QSAR model was also used to predicts flavonoid inhibitory effects on AR activity as a potent treatment for diabetes mellitus (DM), using ANN. Hence, in the era of big data, ML approaches evolved as a powerful and efficient way to deal with the huge amounts of generated data from modern drug discovery in order to model small-molecule drugs, Gene Biomarkers, and identifying the novel drug targets for various diseases.

Nayarisseri Anuraj, Khandelwal Ravina, Tanwar Poonam, Madhavi Maddala, Sharma Diksha, Thakur Garima, Speck-Planche Alejandro, Singh Sanjeev Kumar


Artificial intelligence, Big Data, Drug Discovery, Machine Learning, Precision Medicine, Virtual Screening