Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

MiPepid: MicroPeptide identification tool using machine learning.

In BMC bioinformatics

BACKGROUND : Micropeptides are small proteins with length < = 100 amino acids. Short open reading frames that could produces micropeptides were traditionally ignored due to technical difficulties, as few small peptides had been experimentally confirmed. In the past decade, a growing number of micropeptides have been shown to play significant roles in vital biological activities. Despite the increased amount of data, we still lack bioinformatics tools for specifically identifying micropeptides from DNA sequences. Indeed, most existing tools for classifying coding and noncoding ORFs were built on datasets in which "normal-sized" proteins were considered to be positives and short ORFs were generally considered to be noncoding. Since the functional and biophysical constraints on small peptides are likely to be different from those on "normal" proteins, methods for predicting short translated ORFs must be trained independently from those for longer proteins.

RESULTS : In this study, we have developed MiPepid, a machine-learning tool specifically for the identification of micropeptides. We trained MiPepid using carefully cleaned data from existing databases and used logistic regression with 4-mer features. With only the sequence information of an ORF, MiPepid is able to predict whether it encodes a micropeptide with 96% accuracy on a blind dataset of high-confidence micropeptides, and to correctly classify newly discovered micropeptides not included in either the training or the blind test data. Compared with state-of-the-art coding potential prediction methods, MiPepid performs exceptionally well, as other methods incorrectly classify most bona fide micropeptides as noncoding. MiPepid is alignment-free and runs sufficiently fast for genome-scale analyses. It is easy to use and is available at

CONCLUSIONS : MiPepid was developed to specifically predict micropeptides, a category of proteins with increasing significance, from DNA sequences. It shows evident advantages over existing coding potential prediction methods on micropeptide identification. It is ready to use and runs fast.

Zhu Mengmeng, Gribskov Michael


Coding, Machine learning, Micropeptide, Noncoding, Small ORF, lncRNA, sORF, smORF

General General

Predicting Disease Related microRNA Based on Similarity and Topology.

In Cells

It is known that many diseases are caused by mutations or abnormalities in microRNA (miRNA). The usual method to predict miRNA disease relationships is to build a high-quality similarity network of diseases and miRNAs. All unobserved associations are ranked by their similarity scores, such that a higher score indicates a greater probability of a potential connection. However, this approach does not utilize information within the network. Therefore, in this study, we propose a machine learning method, called STIM, which uses network topology information to predict disease-miRNA associations. In contrast to the conventional approach, STIM constructs features according to information on similarity and topology in networks and then uses a machine learning model to predict potential associations. To verify the reliability and accuracy of our method, we compared STIM to other classical algorithms. The results of fivefold cross validation demonstrated that STIM outperforms many existing methods, particularly in terms of the area under the curve. In addition, the top 30 candidate miRNAs recommended by STIM in a case study of lung neoplasm have been confirmed in previous experiments, which proved the validity of the method.

Chen Zhihua, Wang Xinke, Gao Peng, Liu Hongju, Song Bosheng


heterogeneous network, link prediction, machine learning, miRNA, network embedding, topology information

General General

ARPNet: Antidepressant Response Prediction Network for Major Depressive Disorder.

In Genes

Treating patients with major depressive disorder is challenging because it takes several months for antidepressants prescribed for the patients to take effect. This limitation may result in increased risks and treatment costs. To address this limitation, an accurate antidepressant response prediction model is needed. Recently, several studies have proposed models that extract useful features such as neuroimaging biomarkers and genetic variants from patient data, and use them as predictors for predicting the antidepressant responses of patients. However, it is impossible to utilize all the different types of predictors when making a clinical decision on what drugs to prescribe for a patient. Although a machine learning-based antidepressant response prediction model has been proposed to overcome this problem, the model cannot find the most effective antidepressant for a patient. Based on a neural network, we propose an Antidepressant Response Prediction Network (ARPNet) model capturing high-dimensional patterns from useful features. Based on a literature survey and data-driven feature selection, we extract useful features from patient data, and use the features as predictors. In ARPNet, the patient representation layer captures patient features and the antidepressant prescription representation layer captures antidepressant features. Utilizing the patient and antidepressant prescription representation vectors, ARPNet predicts the degree of antidepressant response. The experimental evaluation results demonstrate that our proposed ARPNet model outperforms machine learning-based models in predicting antidepressant response. Moreover, we demonstrate the applicability of ARPNet in downstream applications in use case scenarios.

Chang Buru, Choi Yonghwa, Jeon Minji, Lee Junhyun, Han Kyu-Man, Kim Aram, Ham Byung-Joo, Kang Jaewoo


antidepressant response prediction, major depressive disorder, neural network, patient representation

General General

In-Silico Molecular Binding Prediction for Human Drug Targets Using Deep Neural Multi-Task Learning.

In Genes

In in-silico prediction for molecular binding of human genomes, promising results have been demonstrated by deep neural multi-task learning due to its strength in training tasks with imbalanced data and its ability to avoid over-fitting. Although the interrelation between tasks is known to be important for successful multi-task learning, its adverse effect has been underestimated. In this study, we used molecular interaction data of human targets from ChEMBL to train and test various multi-task and single-task networks and examined the effectiveness of multi-task learning for different compositions of targets. Targets were clustered based on sequence similarity in their binding domains and various target sets from clusters were chosen. By comparing the performance of deep neural architectures for each target set, we found that similarity within a target set is highly important for reliable multi-task learning. For a diverse target set or overall human targets, the performance of multi-task learning was lower than single-task learning, but outperformed single-task for the target set containing similar targets. From this insight, we developed Multiple Partial Multi-Task learning, which is suitable for binding prediction for human drug targets.

Lee Kyoungyeul, Kim Dongsup


deep learning, in-silico bioactivity prediction, multi-task learning, virtual screening

General General

Prediction of Indoor Air Temperature Using Weather Data and Simple Building Descriptors.

In International journal of environmental research and public health

Non-optimal air temperatures can have serious consequences for human health and productivity. As the climate changes, heatwaves and cold streaks have become more frequent and intense. The ClimApp project aims to develop a smartphone App that provides individualised advice to cope with thermal stress outdoors and indoors. This paper presents a method to predict indoor air temperature to evaluate thermal indoor environments. Two types of input data were used to set up a predictive model: weather data obtained from online weather services and general building attributes to be provided by App users. The method provides discrete predictions of temperature through a decision tree classification algorithm. The data used to train and test the algorithm was obtained from field measurements in seven Danish households and from building simulations considering three different climate regions, ranging from temperate to hot and humid. The results show that the method had an accuracy of 92% (F1-score) when predicting temperatures under previously known conditions (e.g., same household, occupants and climate). However, the performance decreased to 30% under different climate conditions. The approach had the highest performance when predicting the most commonly observed indoor temperatures. The findings suggest that it is possible to develop a straightforward and fairly accurate method for indoor temperature estimation grounded on weather data and simple building attributes.

Aguilera José Joaquín, Andersen Rune Korsholm, Toftum Jørn


indoor temperature, machine learning, thermal comfort, user feedback

Public Health Public Health

Predictors of adherence to nicotine replacement therapy: Machine learning evidence that perceived need predicts medication use.

In Drug and alcohol dependence

BACKGROUND : Nonadherence to smoking cessation medication is a frequent problem. Identifying pre-quit predictors of nonadherence may help explain nonadherence and suggest tailored interventions to address it.

AIMS : Identify and characterize subgroups of smokers based on adherence to nicotine replacement therapy (NRT).

METHOD : Secondary classification tree analyses of data from a 2-arm randomized controlled trial of Recommended Usual Care (R-UC, n = 315) versus Abstinence-Optimized Treatment (A-OT, n = 308) were conducted. R-UC comprised 8 weeks of nicotine patch plus brief counseling whereas A-OT comprised 3 weeks of pre-quit mini-lozenges, 26 weeks of nicotine patch plus mini-lozenges, 11 counseling contacts, and 7-11 automated reminders to use medication. Analyses identified subgroups of smokers highly adherent to nicotine patch use in both treatment conditions, and identified subgroups of A-OT participants highly adherent to mini-lozenges.

RESULTS : Varied facets of nicotine dependence predicted adherence across treatment conditions 4 weeks post-quit and between 4- and 16-weeks post-quit in A-OT, with greater baseline dependence and greater smoking trigger exposure and reactivity predicting greater medication use. Greater quitting motivation and confidence, and believing that stop smoking medication was safe and easy to use were associated with greater adherence.

CONCLUSION : Adherence was especially high in those who were more dependent and more exposed to smoking triggers. Quitting motivation and confidence predicted greater adherence, while negative beliefs about medication safety and acceptability predicted worse adherence. Results suggest that adherent use of medication may reflect a rational appraisal of the likelihood that one will need medication and will benefit from it.

Kim Nayoung, McCarthy Danielle E, Loh Wei-Yin, Cook Jessica W, Piper Megan E, Schlam Tanya R, Baker Timothy B


Adherence, Classification tree, Nicotine dependence, Nicotine replacement therapy, Smoking cessation