Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Classification of estrogenic compounds by coupling high content analysis and machine learning algorithms.

In PLoS computational biology

Environmental toxicants affect human health in various ways. Of the thousands of chemicals present in the environment, those with adverse effects on the endocrine system are referred to as endocrine-disrupting chemicals (EDCs). Here, we focused on a subclass of EDCs that impacts the estrogen receptor (ER), a pivotal transcriptional regulator in health and disease. Estrogenic activity of compounds can be measured by many in vitro or cell-based high throughput assays that record various endpoints from large pools of cells, and increasingly at the single-cell level. To simultaneously capture multiple mechanistic ER endpoints in individual cells that are affected by EDCs, we previously developed a sensitive high throughput/high content imaging assay that is based upon a stable cell line harboring a visible multicopy ER responsive transcription unit and expressing a green fluorescent protein (GFP) fusion of ER. High content analysis generates voluminous multiplex data comprised of minable features that describe numerous mechanistic endpoints. In this study, we present a machine learning pipeline for rapid, accurate, and sensitive assessment of the endocrine-disrupting potential of benchmark chemicals based on data generated from high content analysis. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, both linear logistic regression and nonlinear Random Forest classifiers were benchmarked and evaluated for predicting the estrogenic activity of unknown compounds. Furthermore, through feature selection, data visualization, and model discrimination, the most informative features were identified for the classification of ER agonists/antagonists. The results of this data-driven study showed that highly accurate and generalized classification models with a minimum number of features can be constructed without loss of generality, where these machine learning models serve as a means for rapid mechanistic/phenotypic evaluation of the estrogenic potential of many chemicals.

Mukherjee Rajib, Beykal Burcu, Szafran Adam T, Onel Melis, Stossi Fabio, Mancini Maureen G, Lloyd Dillon, Wright Fred A, Zhou Lan, Mancini Michael A, Pistikopoulos Efstratios N


General General

Cytomegalovirus viral load kinetics as surrogate endpoints after allogeneic transplantation.

In The Journal of clinical investigation ; h5-index 129.0

BACKGROUND : Viral load surrogate endpoints transformed development of HIV and hepatitis C therapeutics. Surrogate endpoints for cytomegalovirus (CMV)-related morbidity and mortality could advance development of antiviral treatments. While observational data support using CMV viral load (VL) as a trial endpoint, randomized controlled trials (RCT) demonstrating direct associations between virologic markers and clinical endpoints are lacking.

METHODS : We performed CMV DNA polymerase chain reaction (PCR) on frozen serum samples from the only placebo-controlled RCT of ganciclovir for early treatment of CMV after hematopoietic cell transplantation (HCT). We used established criteria to assess VL kinetics as surrogates for CMV disease or death by weeks 8, 24, and 48 after randomization and quantified antiviral effects captured by each marker. We used ensemble-based machine learning to assess the predictive ability of VL kinetics and performed this analysis on a ganciclovir prophylaxis RCT for validation.

RESULTS : VL suppression with ganciclovir reduced cumulative incidence of CMV disease and death for 20 years after HCT. Mean VL, peak VL, and change in VL during the first five weeks of treatment fulfilled the Prentice definition for surrogacy, capturing > 95% of ganciclovir's effect, and yielded highly sensitive and specific predictions by week 48. In the prophylaxis trial, viral shedding rate satisfied the Prentice definition for CMV disease by week 24.

CONCLUSION : Our results support using CMV VL kinetics as surrogates for CMV disease, provide a framework for developing CMV preventative and therapeutic agents, and support reductions in viral load as the mechanism through which antivirals reduce CMV disease.

Duke Elizabeth R, Williamson Brian D, Borate Bhavesh, Golob Jonathan L, Wychera Chiara, Stevens-Ayers Terry, Huang Meei-Li, Cossrow Nicole, Wan Hong, Mast T Christopher, Marks Morgan A, Flowers Mary, Jerome Keith R, Corey Lawrence, Gilbert Peter B, Schiffer Joshua T, Boeckh Michael


Clinical Trials, Drug therapy, Infectious disease, Stem cell transplantation

Surgery Surgery

Single-cell transcriptomics of mouse kidney transplants reveals a myeloid cell pathway for transplant rejection.

In JCI insight

Myeloid cells are increasingly recognized as a major player in transplant rejection. Here, we used a murine kidney transplantation model and single-cell transcriptomics to dissect the contribution of myeloid cell subsets and their potential signaling pathways to kidney transplant rejection. Using a variety of bioinformatic techniques including machine learning, we demonstrated that kidney allograft-infiltrating myeloid cells followed a trajectory of differentiating from monocytes to pro-inflammatory macrophages, and exhibited distinct interactions with kidney allograft parenchymal cells. While this process correlated with a unique pattern of myeloid cell transcripts, a top gene identified was Axl, a member of the receptor tyrosine kinase family TAM (Tyro3/Axl/Mertk). Using kidney transplant recipients with Axl gene deficiency, we further demonstrated that Axl augmented intragraft differentiation of pro-inflammatory macrophages, likely via its effect on the transcription factor Cebpb. This in turn promoted intragraft recruitment, differentiation and proliferation of donor-specific T cells, and enhanced early allograft inflammation evidenced by histology. We conclude that myeloid cell Axl expression identified by single-cell transcriptomics of kidney allografts in our study plays a major role in promoting intragraft myeloid cell and T cell differentiation, and presents a novel therapeutic target for controlling kidney allograft rejection and improving kidney allograft survival.

Dangi Anil, Natesh Naveen R, Husain Irma, Ji Zhicheng, Barisoni Laura, Kwun Jean, Shen Xiling, Thorp Edward B, Luo Xunrong


Bioinformatics, Immunology, Macrophages, Organ transplantation, Transplantation

General General

Projective Double Reconstructions Based Dictionary Learning Algorithm for Cross-Domain Recognition.

In IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Dictionary learning plays a significant role in the field of machine learning. Existing works mainly focus on learning dictionary from a single domain. In this paper, we propose a novel projective double reconstructions (PDR) based dictionary learning algorithm for cross-domain recognition. Owing the distribution discrepancy between different domains, the label information is hard utilized for improving discriminability of dictionary fully. Thus, we propose a more flexible label consistent term and associate it with each dictionary item, which makes the reconstruction coefficients have more discriminability as much as possible. Due to the intrinsic correlation between cross-domain data, the data should be reconstructed with each other. Based on this consideration, we further propose a projective double reconstructions scheme to guarantee that the learned dictionary has the abilities of data itself reconstruction and data crossreconstruction. This also guarantees that the data from different domains can be boosted mutually for obtaining a good data alignment, making the learned dictionary have more transferability. We integrate the double reconstructions, label consistency constraint and classifier learning into a unified objective and its solution can be obtained by proposed optimization algorithm that is more efficient than the conventional l1 optimization based dictionary learning methods. The experiments show that the proposed PDR not only greatly reduces the time complexity for both training and testing, but also outperforms over the stateof- the-art methods.

Han Na, Wu Jigang, Fang Xiaozhao, Teng Shaohua, Zhou Guoxu, Xie Shengli, Li Xuelong


oncology Oncology

Effect of an Artificial Intelligence Clinical Decision Support System on Treatment Decisions for Complex Breast Cancer.

In JCO clinical cancer informatics

PURPOSE : To examine the impact of a clinical decision support system (CDSS) on breast cancer treatment decisions and adherence to National Comprehensive Cancer Center (NCCN) guidelines.

PATIENTS AND METHODS : A cross-sectional observational study was conducted involving 1,977 patients at high risk for recurrent or metastatic breast cancer from the Chinese Society of Clinical Oncology. Ten oncologists provided blinded treatment recommendations for an average of 198 patients before and after viewing therapeutic options offered by the CDSS. Univariable and bivariable analyses of treatment changes were performed, and multivariable logistic regressions were estimated to examine the effects of physician experience (years), patient age, and receptor subtype/TNM stage.

RESULTS : Treatment decisions changed in 105 (5%) of 1,977 patients and were concentrated in those with hormone receptor (HR)-positive disease or stage IV disease in the first-line therapy setting (73% and 58%, respectively). Logistic regressions showed that decision changes were more likely in those with HR-positive cancer (odds ratio [OR], 1.58; P < .05) and less likely in those with stage IIA (OR, 0.29; P < .05) or IIIA cancer (OR, 0.08; P < .01). Reasons cited for changes included consideration of the CDSS therapeutic options (63% of patients), patient factors highlighted by the tool (23%), and the decision logic of the tool (13%). Patient age and oncologist experience were not associated with decision changes. Adherence to NCCN treatment guidelines increased slightly after using the CDSS (0.5%; P = .003).

CONCLUSION : Use of an artificial intelligence-based CDSS had a significant impact on treatment decisions and NCCN guideline adherence in HR-positive breast cancers. Although cases of stage IV disease in the first-line therapy setting were also more likely to be changed, the effect was not statistically significant (P = .22). Additional research on decision impact, patient-physician communication, learning, and clinical outcomes is needed to establish the overall value of the technology.

Xu Fengrui, Sepúlveda Martín-J, Jiang Zefei, Wang Haibo, Li Jianbin, Liu Zhenzhen, Yin Yongmei, Roebuck M Christopher, Shortliffe Edward H, Yan Min, Song Yuhua, Geng Cuizhi, Tang Jinhai, Purcell Jackson Gretchen, Preininger Anita M, Rhee Kyu


General General

ML Models of Vibrating H2CO: Comparing Reproducing Kernels, FCHL and PhysNet.

In The journal of physical chemistry. A

Machine Learning (ML) has become a promising tool for improving the quality of atomistic simulations. Using formaldehyde as a benchmark system for intramolecular interactions, a comparative assessment of ML models based on state-of-the-art variants of deep neural networks (NN), reproducing kernel Hilbert space (RKHS+F), and kernel ridge regression (KRR) is presented. Learning curves for energies and atomic forces indicate rapid convergence towards excellent predictions for B3LYP, MP2, and CCSD(T)-F12 reference results for modestly sized (in the hundreds) training sets. Typically, learning curve off-sets decay as one goes from NN (PhysNet) to RKHS+F to KRR (FCHL). Conversely, the predictive power for extrapolation of energies towards new geometries increases in the same order with RKHS+F and FCHL performing almost equally. For harmonic vibrational frequencies, the picture is less clear, with PhysNet and FCHL yielding respectively flat learning at ∽1 and ∼0.2 cm-1 no matter which reference method, while RKHS+F models level off for B3LYP, and exhibit continued improvements for MP2 and CCSD(T)-F12. Finite-temperature molecular dynamics (MD) simulations with the same initial conditions yield indistinguishable infrared spectra with good performance compared with experiment except for the high-frequency modes involving hydrogen stretch motion which is a known limitation of MD for vibrational spectroscopy. For sufficiently large training set sizes all three models can detect insufficient convergence (``noise'') of the reference electronic structure calculations in that the learning curves level off. Transfer learning (TL) from B3LYP to CCSD(T)-F12 with PhysNet indicates that additional improvements in data efficiency can be achieved.

Käser Silvan, Koner Debasish, Christensen Anders S, von Lilienfeld O Anatole, Meuwly Markus