Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Towards a symbiotic relationship between big data, artificial intelligence, and hospital pharmacy.

In Journal of pharmaceutical policy and practice

The digitalization of health and medicine and the growing availability of electronic health records (EHRs) has encouraged healthcare professionals and clinical researchers to adopt cutting-edge methodologies in the realms of artificial intelligence (AI) and big data analytics to exploit existing large medical databases. In Hospital and Health System pharmacies, the application of natural language processing (NLP) and machine learning to access and analyze the unstructured, free-text information captured in millions of EHRs (e.g., medication safety, patients' medication history, adverse drug reactions, interactions, medication errors, therapeutic outcomes, and pharmacokinetic consultations) may become an essential tool to improve patient care and perform real-time evaluations of the efficacy, safety, and comparative effectiveness of available drugs. This approach has an enormous potential to support share-risk agreements and guide decision-making in pharmacy and therapeutics (P&T) Committees.

Del Rio-Bermudez Carlos, Medrano Ignacio H, Yebes Laura, Poveda Jose Luis


Electronic health records, Machine learning, Natural language processing, Pharmacovigilance

General General

ReactionCode: format for reaction searching, analysis, classification, transform, and encoding/decoding.

In Journal of cheminformatics

In the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we present ReactionCode: a new open-source format that allows one to encode and decode a reaction into multi-layer machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.

Delannée Victorien, Nicklaus Marc C


Classification, Decoding, Encoding, Reaction, ReactionCode, Searching

Public Health Public Health

Random forest algorithm to identify factors associated with sports-related dental injuries in 6 to 13-year-old athlete children in Hamadan, Iran-2018 -a cross-sectional study.

In BMC sports science, medicine & rehabilitation

BACKGROUND : Traumatic dental injuries are one of the most important problems with major physical, aesthetic, psychological, social, functional and therapeutic problems that adversely affect the quality of life of children and adolescents. Recently the development of methods based on machine learning algorithms has provided researchers with more powerful tools to more accurate prediction in different domains and evaluate the factors affecting different phenomena more reliably than traditional regression models. This study tries to investigate the performance of random forest (RF) in identifying factors associated with sports-related dental injuries. Also, the accuracy of the RF model for predicting sports-related dental injuries was compared with logistic regression model as traditional competitor.

METHODS : This cross-sectional study was applied to 356 athlete children aged 6 to 13-year-old in Hamadan, Iran. Random forest and logistic regression constructed by using sports-related dental injuries as response variables and age, sex, parent's education, child's birth order, type of sports activity, duration of sports activity, awareness regarding the mouthguard, mouthguard use as input. A self-reported questionnaire was used to obtain information.

RESULTS : Fifty-five (15.4%) subjects had experienced a sports-related dental injury. The mean age of children with sports injuries was significantly higher than children without the experience of injury (p = 0.006). The prevalence of injury was significantly higher in boys (p = 0.008). Children with illiterate mothers are more likely to be injured than children with educated mothers (p = 0.045). Awareness of mouthguard and its use during exercise has a significant effect on reducing the prevalence of injury among users (p < 0.001). Random forest model has a higher prediction accuracy (89.3%) for predicting sports-related dental injuries compared to the logistic regression (84.2%). The results of the relative importance of variables, based on RF showed, mouthguard use, and mouthguard awareness has more contributed importance in dental sport-related injuries' prediction. Subsequently, the importance of sex and age is in the next position.

CONCLUSIONS : Using predictive models such as RF challenges existing inaccurate predictions due to high complexity and interactions between variables would be minimized. This helps to achieve more accurate identification of factors in sport-related dental injury among the general population of children.

Farhadian Maryam, Torkaman Sima, Mojarad Farzad


Athlete, Logistic regression, Mouthguard, Random Forest, Sports-related dental injuries

General General

ISLAND: in-silico proteins binding affinity prediction using sequence information.

In BioData mining

BACKGROUND : Determining binding affinity in protein-protein interactions is important in the discovery and design of novel therapeutics and mutagenesis studies. Determination of binding affinity of proteins in the formation of protein complexes requires sophisticated, expensive and time-consuming experimentation which can be replaced with computational methods. Most computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore sequence-based protein binding affinity prediction using machine learning.

METHOD : We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the protein binding affinity.

RESULTS : We present our findings that the true generalization performance of even the state-of-the-art sequence-only predictor is far from satisfactory and that the development of machine learning methods for binding affinity prediction with improved generalization performance is still an open problem. We have also proposed a sequence-based novel protein binding affinity predictor called ISLAND which gives better accuracy than existing methods over the same validation set as well as on external independent test dataset. A cloud-based webserver implementation of ISLAND and its python code are available at .

CONCLUSION : This paper highlights the fact that the true generalization performance of even the state-of-the-art sequence-only predictor of binding affinity is far from satisfactory and that the development of effective and practical methods in this domain is still an open problem.

Abbasi Wajid Arshad, Yaseen Adiba, Hassan Fahad Ul, Andleeb Saiqa, Minhas Fayyaz Ul Amir Afsar


Binding affinity, Protein sequence analysis, Protein-protein interaction, Support vector machines, Web services

General General

A simple, cost-effective high-throughput image analysis pipeline improves genomic prediction accuracy for days to maturity in wheat.

In Plant methods

BACKGROUND : High-throughput phenotyping and genomic selection accelerate genetic gain in breeding programs by advances in phenotyping and genotyping methods. This study developed a simple, cost-effective high-throughput image analysis pipeline to quantify digital images taken in a panel of 286 Iran bread wheat accessions under terminal drought stress and well-watered conditions. The color proportion of green to yellow (tolerance ratio) and the color proportion of yellow to green (stress ratio) was assessed for each canopy using the pipeline. The estimated tolerance and stress ratios were used as covariates in the genomic prediction models to evaluate the effect of change in canopy color on the improvement of the genomic prediction accuracy of different agronomic traits in wheat.

RESULTS : The reliability of the high-throughput image analysis pipeline was proved by three to four times of improvement in the accuracy of genomic predictions for days to maturity with the use of tolerance and stress ratios as covariates in the univariate genomic selection models. The higher prediction accuracies were attained for days to maturity when both tolerance and stress ratios were used as fixed effects in the univariate models. The results of this study indicated that the Bayesian ridge regression and ridge regression-best linear unbiased prediction methods were superior to other genomic prediction methods which were used in this study under terminal drought stress and well-watered conditions, respectively.

CONCLUSIONS : This study provided a robust, quick, and cost-effective machine learning-enabled image-phenotyping pipeline to improve the genomic prediction accuracy for days to maturity in wheat. The results encouraged the integration of phenomics and genomics in breeding programs.

Shabannejad Morteza, Bihamta Mohammad-Reza, Majidi-Hervan Eslam, Alipour Hadi, Ebrahimi Asa


Days to maturity, Genomic prediction, High-throughput phenotyping, Image analysis, Pipeline, Wheat

Internal Medicine Internal Medicine

Proposal of a new equation for estimating resting energy expenditure of acute kidney injury patients on dialysis: a machine learning approach.

In Nutrition & metabolism

BACKGROUND : The objective of this study was to develop a new predictive equation of resting energy expenditure (REE) for acute kidney injury patients (AKI) on dialysis.

MATERIALS AND METHODS : A cross-sectional descriptive study was carried out of 114 AKI patients, consecutively selected, on dialysis and mechanical ventilation, aged between 19 and 95 years. For construction of the predictive model, 80% of cases were randomly separated to training and 20% of unused cases to validation. Several machine learning models were tested in the training data: linear regression with stepwise, rpart, support vector machine with radial kernel, generalised boosting machine and random forest. The models were selected by ten-fold cross-validation and the performances evaluated by the root mean square error.

RESULTS : There were 364 indirect calorimetry measurements in 114 patients, mean age of 60.65 ± 16.9 years and 68.4% were males. The average REE was 2081 ± 645 kcal. REE was positively correlated with C-reactive protein, minute volume (MV), expiratory positive airway pressure, serum urea, body mass index and inversely with age. The principal variables included in the selected model were age, body mass index, use of vasopressors, expiratory positive airway pressure, MV, C-reactive protein, temperature and serum urea. The final r-value in the validation set was 0.69.

CONCLUSION : We propose a new predictive equation for estimating the REE of AKI patients on dialysis that uses a non-linear approach with better performance than actual models.

Ponce Daniela, de Goes Cassiana Regina, de Andrade Luis Gustavo Modelli


Acute kidney injury, Dialysis, Energy metabolism, Machine learning, Resting energy expenditure, Sepsis