Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences.

In BMC medical informatics and decision making ; h5-index 38.0

BACKGROUND : Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients' disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning models and high dimensional data sources such as electronic health records, magnetic resonance imaging scans, cardiotocograms, etc. These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice.

METHODS : In this work, we focus on AdaBoost, a black box model that has been widely adopted in the CAD literature. We address the challenge - to explain AdaBoost classification - with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), makes use of AdaBoost's adaptive classifier weights. Using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes of the internal decision trees of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model's decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure stability that is better suited to the XAI setting.

RESULTS : Experiments on 9 CAD-related data sets showed that Ada-WHIPS explanations consistently generalise better (mean coverage 15%-68%) than the state of the art while remaining competitive for specificity (mean precision 80%-99%). A very small trade-off in specificity is shown to guard against over-fitting which is a known problem in the state of the art methods.

CONCLUSIONS : The experimental results demonstrate the benefits of using our novel algorithm for explaining CAD AdaBoost classifiers widely found in the literature. Our tightly coupled, AdaBoost-specific approach outperforms model-agnostic explanation methods and should be considered by practitioners looking for an XAI solution for this class of models.

Hatwell Julian, Gaber Mohamed Medhat, Atif Azad R Muhammad


AdaBoost, Black box problem, Computer aided diagnostics, Explainable AI, Interpretability

Public Health Public Health

Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU.

In BMC medical informatics and decision making ; h5-index 38.0

BACKGROUND : Early and accurate identification of sepsis patients with high risk of in-hospital death can help physicians in intensive care units (ICUs) make optimal clinical decisions. This study aimed to develop machine learning-based tools to predict the risk of hospital death of patients with sepsis in ICUs.

METHODS : The source database used for model development and validation is the medical information mart for intensive care (MIMIC) III. We identified adult sepsis patients using the new sepsis definition Sepsis-3. A total of 86 predictor variables consisting of demographics, laboratory tests and comorbidities were used. We employed the least absolute shrinkage and selection operator (LASSO), random forest (RF), gradient boosting machine (GBM) and the traditional logistic regression (LR) method to develop prediction models. In addition, the prediction performance of the four developed models was evaluated and compared with that of an existent scoring tool - simplified acute physiology score (SAPS) II - using five different performance measures: the area under the receiver operating characteristic curve (AUROC), Brier score, sensitivity, specificity and calibration plot.

RESULTS : The records of 16,688 sepsis patients in MIMIC III were used for model training and test. Amongst them, 2949 (17.7%) patients had in-hospital death. The average AUROCs of the LASSO, RF, GBM, LR and SAPS II models were 0.829, 0.829, 0.845, 0.833 and 0.77, respectively. The Brier scores of the LASSO, RF, GBM, LR and SAPS II models were 0.108, 0.109, 0.104, 0.107 and 0.146, respectively. The calibration plots showed that the GBM, LASSO and LR models had good calibration; the RF model underestimated high-risk patients; and SAPS II had the poorest calibration.

CONCLUSION : The machine learning-based models developed in this study had good prediction performance. Amongst them, the GBM model showed the best performance in predicting the risk of in-hospital death. It has the potential to assist physicians in the ICU to perform appropriate clinical interventions for critically ill sepsis patients and thus may help improve the prognoses of sepsis patients in the ICU.

Kong Guilan, Lin Ke, Hu Yonghua


In-hospital mortality, Intensive care unit, Machine learning, Prediction model, Sepsis

General General

A workflow for exploring ligand dissociation from a macromolecule: Efficient random acceleration molecular dynamics simulation and interaction fingerprint analysis of ligand trajectories.

In The Journal of chemical physics

The dissociation of ligands from proteins and other biomacromolecules occurs over a wide range of timescales. For most pharmaceutically relevant inhibitors, these timescales are far beyond those that are accessible by conventional molecular dynamics (MD) simulation. Consequently, to explore ligand egress mechanisms and compute dissociation rates, it is necessary to enhance the sampling of ligand unbinding. Random Acceleration MD (RAMD) is a simple method to enhance ligand egress from a macromolecular binding site, which enables the exploration of ligand egress routes without prior knowledge of the reaction coordinates. Furthermore, the τRAMD procedure can be used to compute the relative residence times of ligands. When combined with a machine-learning analysis of protein-ligand interaction fingerprints (IFPs), molecular features that affect ligand unbinding kinetics can be identified. Here, we describe the implementation of RAMD in GROMACS 2020, which provides significantly improved computational performance, with scaling to large molecular systems. For the automated analysis of RAMD results, we developed MD-IFP, a set of tools for the generation of IFPs along unbinding trajectories and for their use in the exploration of ligand dynamics. We demonstrate that the analysis of ligand dissociation trajectories by mapping them onto the IFP space enables the characterization of ligand dissociation routes and metastable states. The combined implementation of RAMD and MD-IFP provides a computationally efficient and freely available workflow that can be applied to hundreds of compounds in a reasonable computational time and will facilitate the use of τRAMD in drug design.

Kokh Daria B, Doser Bernd, Richter Stefan, Ormersbach Fabian, Cheng Xingyi, Wade Rebecca C


General General

A combination of machine learning and infrequent metadynamics to efficiently predict kinetic rates, transition states, and molecular determinants of drug dissociation from G protein-coupled receptors.

In The Journal of chemical physics

Determining the drug-target residence time (RT) is of major interest in drug discovery given that this kinetic parameter often represents a better indicator of in vivo drug efficacy than binding affinity. However, obtaining drug-target unbinding rates poses significant challenges, both computationally and experimentally. This is particularly palpable for complex systems like G Protein-Coupled Receptors (GPCRs) whose ligand unbinding typically requires very long timescales oftentimes inaccessible by standard molecular dynamics simulations. Enhanced sampling methods offer a useful alternative, and their efficiency can be further improved by using machine learning tools to identify optimal reaction coordinates. Here, we test the combination of two machine learning techniques, automatic mutual information noise omission and reweighted autoencoded variational Bayes for enhanced sampling, with infrequent metadynamics to efficiently study the unbinding kinetics of two classical drugs with different RTs in a prototypic GPCR, the μ-opioid receptor. Dissociation rates derived from these computations are within one order of magnitude from experimental values. We also use the simulation data to uncover the dissociation mechanisms of these drugs, shedding light on the structures of rate-limiting transition states, which, alongside metastable poses, are difficult to obtain experimentally but important to visualize when designing drugs with a desired kinetic profile.

Lamim Ribeiro João Marcelo, Provasi Davide, Filizola Marta


General General

Recursive evaluation and iterative contraction of N-body equivariant features.

In The Journal of chemical physics

Mapping an atomistic configuration to a symmetrized N-point correlation of a field associated with the atomic positions (e.g., an atomic density) has emerged as an elegant and effective solution to represent structures as the input of machine-learning algorithms. While it has become clear that low-order density correlations do not provide a complete representation of an atomic environment, the exponential increase in the number of possible N-body invariants makes it difficult to design a concise and effective representation. We discuss how to exploit recursion relations between equivariant features of different order (generalizations of N-body invariants that provide a complete representation of the symmetries of improper rotations) to compute high-order terms efficiently. In combination with the automatic selection of the most expressive combination of features at each order, this approach provides a conceptual and practical framework to generate systematically improvable, symmetry adapted representations for atomistic machine learning.

Nigam Jigyasa, Pozdnyakov Sergey, Ceriotti Michele


Internal Medicine Internal Medicine

Application of A Convolutional Neural Network in The Diagnosis of Gastric Mesenchymal Tumors on Endoscopic Ultrasonography Images.

In Journal of clinical medicine

BACKGROUND AND AIMS : Endoscopic ultrasonography (EUS) is a useful diagnostic modality for evaluating gastric mesenchymal tumors; however, differentiating gastrointestinal stromal tumors (GISTs) from benign mesenchymal tumors such as leiomyomas and schwannomas remains challenging. For this reason, we developed a convolutional neural network computer-aided diagnosis (CNN-CAD) system that can analyze gastric mesenchymal tumors on EUS images.

METHODS : A total of 905 EUS images of gastric mesenchymal tumors (pathologically confirmed GIST, leiomyoma, and schwannoma) were used as a training dataset. Validation was performed using 212 EUS images of gastric mesenchymal tumors. This test dataset was interpreted by three experienced and three junior endoscopists.

RESULTS : The sensitivity, specificity, and accuracy of the CNN-CAD system for differentiating GISTs from non-GIST tumors were 83.0%, 75.5%, and 79.2%, respectively. Its diagnostic specificity and accuracy were significantly higher than those of two experienced and one junior endoscopists. In the further sequential analysis to differentiate leiomyoma from schwannoma in non-GIST tumors, the final diagnostic accuracy of the CNN-CAD system was 72.5%, which was significantly higher than that of two experienced and one junior endoscopists.

CONCLUSIONS : Our CNN-CAD system showed high accuracy in diagnosing gastric mesenchymal tumors on EUS images. It may complement the current clinical practices in the EUS diagnosis of gastric mesenchymal tumors.

Kim Yoon Ho, Kim Gwang Ha, Kim Kwang Baek, Lee Moon Won, Lee Bong Eun, Baek Dong Hoon, Kim Do Hoon, Park Jun Chul


artificial intelligence, endoscopic ultrasonography, gastrointestinal stromal tumor, mesenchymal tumor, stomach