Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Adverse Drug Reaction Discovery from Electronic Health Records with Deep Neural Networks.

In Proceedings of the ACM Conference on Health, Inference, and Learning

Adverse drug reactions (ADRs) are detrimental and unexpected clinical incidents caused by drug intake. The increasing availability of massive quantities of longitudinal event data such as electronic health records (EHRs) has redefined ADR discovery as a big data analytics problem, where data-hungry deep neural networks are especially suitable because of the abundance of the data. To this end, we introduce neural self-controlled case series (NSCCS), a deep learning framework for ADR discovery from EHRs. NSCCS rigorously follows a self-controlled case series design to adjust implicitly and efficiently for individual heterogeneity. In this way, NSCCS is robust to time-invariant confounding issues and thus more capable of identifying associations that reflect the underlying mechanism between various types of drugs and adverse conditions. We apply NSCCS to a large-scale, real-world EHR dataset and empirically demonstrate its superior performance with comprehensive experiments on a benchmark ADR discovery task.

Zhang Wei, Peissig Peggy, Kuang Zhaobin, Page David


Adverse Drug Reaction Discovery, Deep Neural Networks, Electronic Health Records, Self-Controlled Case Series

General General

Channel Embedding for Informative Protein Identification from Highly Multiplexed Images.

In Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Interest is growing rapidly in using deep learning to classify biomedical images, and interpreting these deep-learned models is necessary for life-critical decisions and scientific discovery. Effective interpretation techniques accelerate biomarker discovery and provide new insights into the etiology, diagnosis, and treatment of disease. Most interpretation techniques aim to discover spatially-salient regions within images, but few techniques consider imagery with multiple channels of information. For instance, highly multiplexed tumor and tissue images have 30-100 channels and require interpretation methods that work across many channels to provide deep molecular insights. We propose a novel channel embedding method that extracts features from each channel. We then use these features to train a classifier for prediction. Using this channel embedding, we apply an interpretation method to rank the most discriminative channels. To validate our approach, we conduct an ablation study on a synthetic dataset. Moreover, we demonstrate that our method aligns with biological findings on highly multiplexed images of breast cancer cells while outperforming baseline pipelines. Code is available at

Magid Salma Abdel, Jang Won-Dong, Schapiro Denis, Wei Donglai, Tompkin James, Sorger Peter K, Pfister Hanspeter


Deep learning, Highly multiplexed imaging, Interpretability

General General

Cartilage Segmentation in High-Resolution 3D Micro-CT Images via Uncertainty-Guided Self-training with Very Sparse Annotation.

In Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Craniofacial syndromes often involve skeletal defects of the head. Studying the development of the chondrocranium (the part of the endoskeleton that protects the brain and other sense organs) is crucial to understanding genotype-phenotype relationships and early detection of skeletal malformation. Our goal is to segment craniofacial cartilages in 3D micro-CT images of embryonic mice stained with phosphotungstic acid. However, due to high image resolution, complex object structures, and low contrast, delineating fine-grained structures in these images is very challenging, even manually. Specifically, only experts can differentiate cartilages, and it is unrealistic to manually label whole volumes for deep learning model training. We propose a new framework to progressively segment cartilages in high-resolution 3D micro-CT images using extremely sparse annotation (e.g., annotating only a few selected slices in a volume). Our model consists of a lightweight fully convolutional network (FCN) to accelerate the training speed and generate pseudo labels (PLs) for unlabeled slices. Meanwhile, we take into account the reliability of PLs using a bootstrap ensemble based uncertainty quantification method. Further, our framework gradually learns from the PLs with the guidance of the uncertainty estimation via self-training. Experiments show that our method achieves high segmentation accuracy compared to prior arts and obtains performance gains by iterative self-training.

Zheng Hao, Perrine Susan M Motch, Pitirri M Kathleen, Kawasaki Kazuhiko, Wang Chaoli, Richtsmeier Joan T, Chen Danny Z


Cartilage segmentation, Sparse annotation, Uncertainty

General General

Predicting Missing Values in Medical Data via XGBoost Regression.

In Journal of healthcare informatics research

Purpose : The data in a patient's laboratory test result is a notable resource to support clinical investigation and enhance medical research. However, for a variety of reasons, this type of data often contains a non-trivial number of missing values. For example, physicians may neglect to order tests or document the results. Such a phenomenon reduces the degree to which this data can be utilized to learn efficient and effective predictive models. To address this problem, various approaches have been developed to impute missing laboratory values; however, their performance has been limited. This is due, in part, to the fact no approaches effectively leverage the contextual information 1) in individual or 2) between laboratory test variables.

Method : We introduce an approach to combine an unsupervised prefilling strategy with a supervised machine learning approach, in the form of extreme gradient boosting (XGBoost), to leverage both types of context for imputation purposes. We evaluated the methodology through a series of experiments on approximately 8,200 patients' records in the MIMIC-III dataset.

Result : The results demonstrate that the new model outperforms baseline and state-of-the-art models on 13 commonly collected laboratory test variables. In terms of the normalized root mean square derivation (nRMSD), our model exhibits an imputation improvement by over 20%, on average.

Conclusion : Missing data imputation on the temporal variables can be largely improved via prefilling strategy and the supervised training technique, which leverages both the longitudinal and cross-sectional context simultaneously.

Zhang Xinmeng, Yan Chao, Gao Cheng, Malin Bradley A, Chen You


XGBoost, imputation, laboratory tests, missing values

oncology Oncology

Breakthrough Cancer Pain Clinical Features and Differential Opioids Response: A Machine Learning Approach in Patients With Cancer From the IOPS-MS Study.

In JCO precision oncology

PURPOSE : A large proportion of patients with cancer suffer from breakthrough cancer pain (BTcP). Several unmet clinical needs concerning BTcP treatment, such as optimal opioid dosages, are being investigated. In this analysis the hypothesis, we explore with an unsupervised learning algorithm whether distinct subtypes of BTcP exist and whether they can provide new insights into clinical practice.

METHODS : Partitioning around a k-medoids algorithm on a large data set of patients with BTcP, previously collected by the Italian Oncologic Pain Survey group, was used to identify possible subgroups of BTcP. Resulting clusters were analyzed in terms of BTcP therapy satisfaction, clinical features, and use of basal pain and rapid-onset opioids. Opioid dosages were converted to a unique scale and the BTcP opioids-to-basal pain opioids ratio was calculated for each patient. We used polynomial logistic regression to catch nonlinear relationships between therapy satisfaction and opioid use.

RESULTS : Our algorithm identified 12 distinct BTcP clusters. Optimal BTcP opioids-to-basal pain opioids ratios differed across the clusters, ranging from 15% to 50%. The majority of clusters were linked to a peculiar association of certain drugs with therapy satisfaction or dissatisfaction. A free online tool was created for new patients' cluster computation to validate these clusters in future studies and provide handy indications for personalized BTcP therapy.

CONCLUSION : This work proposes a classification for BTcP and identifies subgroups of patients with unique efficacy of different pain medications. This work supports the theory that the optimal dose of BTcP opioids depends on the dose of basal opioids and identifies novel values that are possibly useful for future trials. These results will allow us to target BTcP therapy on the basis of patient characteristics and to define a precision medicine strategy also for supportive care.

Pantano Francesco, Manca Paolo, Armento Grazia, Zeppola Tea, Onorato Angelo, Iuliani Michele, Simonetti Sonia, Vincenzi Bruno, Santini Daniele, Mercadante Sebastiano, Marchetti Paolo, Cuomo Arturo, Caraceni Augusto, Mediati Rocco Domenico, Vellucci Renato, Mammucari Massimo, Natoli Silvia, Lazzari Marzia, Dauri Mario, Adile Claudio, Airoldi Mario, Azzarello Giuseppe, Blasi Livio, Chiurazzi Bruno, Degiovanni Daniela, Fusco Flavio, Guardamagna Vittorio, Liguori Simeone, Palermo Loredana, Mameli Sergio, Masedu Francesco, Mazzei Teresita, Melotti Rita Maria, Menardo Valentino, Miotti Danilo, Moroso Stefano, Pascoletti Gaetano, De Santis Stefano, Orsetti Remo, Papa Alfonso, Ricci Sergio, Scelzi Elvira, Sofia Michele, Aielli Federica, Valle Alessandro, Tonini Giuseppe


General General

Tuplewise Material Representation Based Machine Learning for Accurate Band Gap Prediction.

In The journal of physical chemistry. A

The open-access material databases allowed us to approach scientific questions from a completely new perspective with machine learning methods. Here, on the basis of open-access databases, we focus on the classical band gap problem for predicting accurately the band gap of a crystalline compound using a machine learning approach with newly developed tuplewise graph neural networks (TGNN), which is devised to automatically generate input representation of crystal structures in tuple types and to exploit crystal-level properties as one of the input features. Our method brings about a highly accurate prediction of the band gaps at hybrid functionals and GW approximation levels for multiple material data sets without heavy computational cost. Furthermore, to demonstrate the applicability of our prediction model, we provide a data set of GW band gaps for 45835 materials predicted by TGNN posing higher accuracy than standard density functional theory calculations.

Na Gyoung S, Jang Seunghun, Lee Yea-Lee, Chang Hyunju