Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Surgery Surgery

Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences.

In Biology

Antioxidant proteins are involved importantly in many aspects of cellular life activities. They protect the cell and DNA from oxidative substances (such as peroxide, nitric oxide, oxygen-free radicals, etc.) which are known as reactive oxygen species (ROS). Free radical generation and antioxidant defenses are opposing factors in the human body and the balance between them is necessary to maintain a healthy body. An unhealthy routine or the degeneration of age can break the balance, leading to more ROS than antioxidants, causing damage to health. In general, the antioxidant mechanism is the combination of antioxidant molecules and ROS in a one-electron reaction. Creating computational models to promptly identify antioxidant candidates is essential in supporting antioxidant detection experiments in the laboratory. In this study, we proposed a machine learning-based model for this prediction purpose from a benchmark set of sequencing data. The experiments were conducted by using 10-fold cross-validation on the training process and validated by three different independent datasets. Different machine learning and deep learning algorithms have been evaluated on an optimal set of sequence features. Among them, Random Forest has been identified as the best model to identify antioxidant proteins with the highest performance. Our optimal model achieved high accuracy of 84.6%, as well as a balance in sensitivity (81.5%) and specificity (85.1%) for antioxidant protein identification on the training dataset. The performance results from different independent datasets also showed the significance in our model compared to previously published works on antioxidant protein identification.

Ho Thanh Lam Luu, Le Ngoc Hoang, Van Tuan Le, Tran Ban Ho, Nguyen Khanh Hung Truong, Nguyen Ngan Thi Kim, Huu Dang Luong, Le Nguyen Quoc Khanh


Random Forest, antioxidant proteins, computational modeling, feature selection, machine learning, protein sequencing

General General

Evaluation of a genetic risk score for severity of COVID-19 using human chromosomal-scale length variation.

In Human genomics

INTRODUCTION : The course of COVID-19 varies from asymptomatic to severe in patients. The basis for this range in symptoms is unknown. One possibility is that genetic variation is partly responsible for the highly variable response. We evaluated how well a genetic risk score based on chromosomal-scale length variation and machine learning classification algorithms could predict severity of response to SARS-CoV-2 infection.

METHODS : We compared 981 patients from the UK Biobank dataset who had a severe reaction to SARS-CoV-2 infection before 27 April 2020 to a similar number of age-matched patients drawn for the general UK Biobank population. For each patient, we built a profile of 88 numbers characterizing the chromosomal-scale length variability of their germ line DNA. Each number represented one quarter of the 22 autosomes. We used the machine learning algorithm XGBoost to build a classifier that could predict whether a person would have a severe reaction to COVID-19 based only on their 88-number classification.

RESULTS : We found that the XGBoost classifier could differentiate between the two classes at a significant level (p = 2 · 10-11) as measured against a randomized control and (p = 3 · 10-14) as measured against the expected value of a random guessing algorithm (AUC = 0.5). However, we found that the AUC of the classifier was only 0.51, too low for a clinically useful test.

CONCLUSION : Genetics play a role in the severity of COVID-19, but we cannot yet develop a useful genetic test to predict severity.

Toh Christopher, Brody James P


COVID-19, Genetic risk score, Machine learning, UK biobank

Cardiology Cardiology

Integration of novel monitoring devices with machine learning technology for scalable cardiovascular management.

In Nature reviews. Cardiology ; h5-index 74.0

Ambulatory monitoring is increasingly important for cardiovascular care but is often limited by the unpredictability of cardiovascular events, the intermittent nature of ambulatory monitors and the variable clinical significance of recorded data in patients. Technological advances in computing have led to the introduction of novel physiological biosignals that can increase the frequency at which abnormalities in cardiovascular parameters can be detected, making expert-level, automated diagnosis a reality. However, use of these biosignals for diagnosis also raises numerous concerns related to accuracy and actionability within clinical guidelines, in addition to medico-legal and ethical issues. Analytical methods such as machine learning can potentially increase the accuracy and improve the actionability of device-based diagnoses. Coupled with interoperability of data to widen access to all stakeholders, seamless connectivity (an internet of things) and maintenance of anonymity, this approach could ultimately facilitate near-real-time diagnosis and therapy. These tools are increasingly recognized by regulatory agencies and professional medical societies, but several technical and ethical issues remain. In this Review, we describe the current state of cardiovascular monitoring along the continuum from biosignal acquisition to the identification of novel biosensors and the development of analytical techniques and ultimately to regulatory and ethical issues. Furthermore, we outline new paradigms for cardiovascular monitoring.

Krittanawong Chayakrit, Rogers Albert J, Johnson Kipp W, Wang Zhen, Turakhia Mintu P, Halperin Jonathan L, Narayan Sanjiv M


Surgery Surgery

A bioinformatic study of antimicrobial peptides identified in the Black Soldier Fly (BSF) Hermetia illucens (Diptera: Stratiomyidae).

In Scientific reports ; h5-index 158.0

Antimicrobial peptides (AMPs) play a key role in the innate immunity, the first line of defense against bacteria, fungi, and viruses. AMPs are small molecules, ranging from 10 to 100 amino acid residues produced by all living organisms. Because of their wide biodiversity, insects are among the richest and most innovative sources for AMPs. In particular, the insect Hermetia illucens (Diptera: Stratiomyidae) shows an extraordinary ability to live in hostile environments, as it feeds on decaying substrates, which are rich in microbial colonies, and is one of the most promising sources for AMPs. The larvae and the combined adult male and female H. illucens transcriptomes were examined, and all the sequences, putatively encoding AMPs, were analysed with different machine learning-algorithms, such as the Support Vector Machine, the Discriminant Analysis, the Artificial Neural Network, and the Random Forest available on the CAMP database, in order to predict their antimicrobial activity. Moreover, the iACP tool, the AVPpred, and the Antifp servers were used to predict the anticancer, the antiviral, and the antifungal activities, respectively. The related physicochemical properties were evaluated with the Antimicrobial Peptide Database Calculator and Predictor. These analyses allowed to identify 57 putatively active peptides suitable for subsequent experimental validation studies.

Moretta Antonio, Salvia Rosanna, Scieuzo Carmen, Di Somma Angela, Vogel Heiko, Pucci Pietro, Sgambato Alessandro, Wolff Michael, Falabella Patrizia


General General

A comprehensive study on classification of COVID-19 on computed tomography with pretrained convolutional neural networks.

In Scientific reports ; h5-index 158.0

The use of imaging data has been reported to be useful for rapid diagnosis of COVID-19. Although computed tomography (CT) scans show a variety of signs caused by the viral infection, given a large amount of images, these visual features are difficult and can take a long time to be recognized by radiologists. Artificial intelligence methods for automated classification of COVID-19 on CT scans have been found to be very promising. However, current investigation of pretrained convolutional neural networks (CNNs) for COVID-19 diagnosis using CT data is limited. This study presents an investigation on 16 pretrained CNNs for classification of COVID-19 using a large public database of CT scans collected from COVID-19 patients and non-COVID-19 subjects. The results show that, using only 6 epochs for training, the CNNs achieved very high performance on the classification task. Among the 16 CNNs, DenseNet-201, which is the deepest net, is the best in terms of accuracy, balance between sensitivity and specificity, [Formula: see text] score, and area under curve. Furthermore, the implementation of transfer learning with the direct input of whole image slices and without the use of data augmentation provided better classification rates than the use of data augmentation. Such a finding alleviates the task of data augmentation and manual extraction of regions of interest on CT images, which are adopted by current implementation of deep-learning models for COVID-19 classification.

Pham Tuan D


Radiology Radiology

Robust deep learning classification of adamantinomatous craniopharyngioma from limited preoperative radiographic images.

In Scientific reports ; h5-index 158.0

Deep learning (DL) is a widely applied mathematical modeling technique. Classically, DL models utilize large volumes of training data, which are not available in many healthcare contexts. For patients with brain tumors, non-invasive diagnosis would represent a substantial clinical advance, potentially sparing patients from the risks associated with surgical intervention on the brain. Such an approach will depend upon highly accurate models built using the limited datasets that are available. Herein, we present a novel genetic algorithm (GA) that identifies optimal architecture parameters using feature embeddings from state-of-the-art image classification networks to identify the pediatric brain tumor, adamantinomatous craniopharyngioma (ACP). We optimized classification models for preoperative Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and combined CT and MRI datasets with demonstrated test accuracies of 85.3%, 83.3%, and 87.8%, respectively. Notably, our GA improved baseline model performance by up to 38%. This work advances DL and its applications within healthcare by identifying optimized networks in small-scale data contexts. The proposed system is easily implementable and scalable for non-invasive computer-aided diagnosis, even for uncommon diseases.

Prince Eric W, Whelan Ros, Mirsky David M, Stence Nicholas, Staulcup Susan, Klimo Paul, Anderson Richard C E, Niazi Toba N, Grant Gerald, Souweidane Mark, Johnston James M, Jackson Eric M, Limbrick David D, Smith Amy, Drapeau Annie, Chern Joshua J, Kilburn Lindsay, Ginn Kevin, Naftel Robert, Dudley Roy, Tyler-Kabara Elizabeth, Jallo George, Handler Michael H, Jones Kenneth, Donson Andrew M, Foreman Nicholas K, Hankinson Todd C