Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Predicting liver cytosol stability of small molecules.

In Journal of cheminformatics

Over the last few decades, chemists have become skilled at designing compounds that avoid cytochrome P (CYP) 450 mediated metabolism. Typical screening assays are performed in liver microsomal fractions and it is possible to overlook the contribution of cytosolic enzymes until much later in the drug discovery process. Few data exist on cytosolic enzyme-mediated metabolism and no reliable tools are available to chemists to help design away from such liabilities. In this study, we screened 1450 compounds for liver cytosol-mediated metabolic stability and extracted transformation rules that might help medicinal chemists in optimizing compounds with these liabilities. In vitro half-life data were collected by performing in-house experiments in mouse (CD-1 male) and human (mixed gender) cytosol fractions. Matched molecular pairs analysis was performed in conjunction with qualitative-structure activity relationship modeling to identify chemical structure transformations affecting cytosolic stability. The transformation rules were prospectively validated on the test set. In addition, selected rules were validated on a diverse chemical library and the resulting pairs were experimentally tested to confirm whether the identified transformations could be generalized. The validation results, comprising nearly 250 library compounds and corresponding half-life data, are made publicly available. The datasets were also used to generate in silico classification models, based on different molecular descriptors and machine learning methods, to predict cytosol-mediated liabilities. To the best of our knowledge, this is the first systematic in silico effort to address cytosolic enzyme-mediated liabilities.

Shah Pranav, Siramshetty Vishal B, Zakharov Alexey V, Southall Noel T, Xu Xin, Nguyen Dac-Trung


Cytosol stability, Machine learning, Matched molecular pairs, Qualitative-structure activity relationship, Xenobiotic metabolism

General General

Quantitative prediction of selectivity between the A1 and A2A adenosine receptors.

In Journal of cheminformatics

The development of drugs is often hampered due to off-target interactions leading to adverse effects. Therefore, computational methods to assess the selectivity of ligands are of high interest. Currently, selectivity is often deduced from bioactivity predictions of a ligand for multiple targets (individual machine learning models). Here we show that modeling selectivity directly, by using the affinity difference between two drug targets as output value, leads to more accurate selectivity predictions. We test multiple approaches on a dataset consisting of ligands for the A1 and A2A adenosine receptors (among others classification, regression, and we define different selectivity classes). Finally, we present a regression model that predicts selectivity between these two drug targets by directly training on the difference in bioactivity, modeling the selectivity-window. The quality of this model was good as shown by the performances for fivefold cross-validation: ROC A1AR-selective 0.88 ± 0.04 and ROC A2AAR-selective 0.80 ± 0.07. To increase the accuracy of this selectivity model even further, inactive compounds were identified and removed prior to selectivity prediction by a combination of statistical models and structure-based docking. As a result, selectivity between the A1 and A2A adenosine receptors was predicted effectively using the selectivity-window model. The approach presented here can be readily applied to other selectivity cases.

Burggraaff Lindsey, van Vlijmen Herman W T, IJzerman Adriaan P, van Westen Gerard J P


A1 adenosine receptor, A2A adenosine receptor, GPCR, Modeling, QSAR, Selectivity, Selectivity window

General General

KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development.

In Journal of cheminformatics

Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process.

Morger Andrea, Mathea Miriam, Achenbach Janosch H, Wolf Antje, Buesen Roland, Schleifer Klaus-Juergen, Landsiedel Robert, Volkamer Andrea


Androgen receptor, Applicability domain, Case study, Confidence estimation, Conformal prediction, Random forest, Read-across, ToxCast, Toxicity prediction, Triazoles

General General

Mol-CycleGAN: a generative model for molecular optimization.

In Journal of cheminformatics

Designing a molecule with desired properties is one of the biggest challenges in drug development, as it requires optimization of chemical compound structures with respect to many complex properties. To improve the compound design process, we introduce Mol-CycleGAN-a CycleGAN-based model that generates optimized compounds with high structural similarity to the original ones. Namely, given a molecule our model generates a structurally similar one with an optimized value of the considered property. We evaluate the performance of the model on selected optimization objectives related to structural properties (presence of halogen groups, number of aromatic rings) and to a physicochemical property (penalized logP). In the task of optimization of penalized logP of drug-like molecules our model significantly outperforms previous results.

Maziarka Łukasz, Pocha Agnieszka, Kaczmarczyk Jan, Rataj Krzysztof, Danel Tomasz, Warchoł Michał


Deep learning, Drug design, Generative models, Molecular optimization

General General

Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications.

In Journal of cheminformatics

Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine's inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.

Chen Chia-Hsiu, Tanaka Kenichi, Kotera Masaaki, Funatsu Kimito


Blending, Decision tree, Ensemble learning, Extremely randomized trees, Fluorescence, Liquid crystal, QSPR, Quantitative structure–property, Random forest

General General

Multiobjective de novo drug design with recurrent neural networks and nondominated sorting.

In Journal of cheminformatics

Research productivity in the pharmaceutical industry has declined significantly in recent decades, with higher costs, longer timelines, and lower success rates of drug candidates in clinical trials. This has prioritized the scalability and multiobjectivity of drug discovery and design. De novo drug design has emerged as a promising approach; molecules are generated from scratch, thus reducing the reliance on trial and error and premade molecular repositories. However, optimizing for molecular traits remains challenging, impeding the implementation of de novo methods. In this work, we propose a de novo approach capable of optimizing multiple traits collectively. A recurrent neural network was used to generate molecules which were then ranked based on multiple properties by a nondominated sorting algorithm. The best of the molecules generated were selected and used to fine-tune the recurrent neural network through transfer learning, creating a cycle that mimics the traditional design-synthesis-test cycle. We demonstrate the efficacy of this approach through a proof of concept, optimizing for constraints on molecular weight, octanol-water partition coefficient, the number of rotatable bonds, hydrogen bond donors, and hydrogen bond acceptors simultaneously. Analysis of the molecules generated after five iterations of the cycle revealed a 14-fold improvement in the quality of generated molecules, along with improvements to the accuracy of the recurrent neural network and the structural diversity of the molecules generated. This cycle notably does not require large amounts of training data nor any handwritten scoring functions. Altogether, this approach uniquely combines scalable generation with multiobjective optimization of molecules.

Yasonik Jacob


De novo drug design, Deep learning, Multiobjective optimization