Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Joint learning of multiple gene networks from single-cell gene expression data.

In Computational and structural biotechnology journal

Inferring gene networks from gene expression data is important for understanding functional organizations within cells. With the accumulation of single-cell RNA sequencing (scRNA-seq) data, it is possible to infer gene networks at single cell level. However, due to the characteristics of scRNA-seq data, such as cellular heterogeneity and high sparsity caused by dropout events, traditional network inference methods may not be suitable for scRNA-seq data. In this study, we introduce a novel joint Gaussian copula graphical model (JGCGM) to jointly estimate multiple gene networks for multiple cell subgroups from scRNA-seq data. Our model can deal with non-Gaussian data with missing values, and identify the common and unique network structures of multiple cell subgroups, which is suitable for scRNA-seq data. Extensive experiments on synthetic data demonstrate that our proposed model outperforms other compared state-of-the-art network inference models. We apply our model to real scRNA-seq data sets to infer gene networks of different cell subgroups. Hub genes in the estimated gene networks are found to be biological significance.

Wu Nuosi, Yin Fu, Ou-Yang Le, Zhu Zexuan, Xie Weixin


Gene network, Graphical model, Single-cell RNA sequencing

General General

Twelve tips on guiding preparation for both high-stakes exams and long-term learning.

In Medical teacher

High-stakes exams including admissions, licensing, and maintenance of certification examinations are commonplace in health professions education. Although exam scores and performance can often serve gate-keeping purposes, the broader goal of health professions education is to foster deep, self-directed, meaningful, motivated learning. Establishing strong support structures that emphasize deep learning and understanding rather than exam scores can be beneficial to preparing learners who have the knowledge base to be excellent practitioners. This article offers guidance that can be used by academic support centres, medical educators, learning specialists, and faculty advisors, or even test-takers, to help learners to balance score achievement and knowledge development, while simultaneously cultivating more efficient and motivated studying and increasingly self-regulated learning. This series of tips details considerations for building academic success supports, fostering a growth mindset, planning efficient and effective studying efforts, utilizing test-enhanced learning strategies, exam-taking skills practice, and other support structures that can help strengthen learning experiences overall.

Swan Sein Aubrie, Dathatri Shubha, Bates Todd A


Evidence-based learning practices, exam preparation, learner support, self-directed learning, test-enhanced learning

General General

Predicting Coronavirus Disease 2019 Infection Risk and Related Risk Drivers in Nursing Homes: A Machine Learning Approach.

In Journal of the American Medical Directors Association

OBJECTIVE : Inform coronavirus disease 2019 (COVID-19) infection prevention measures by identifying and assessing risk and possible vectors of infection in nursing homes (NHs) using a machine-learning approach.

DESIGN : This retrospective cohort study used a gradient boosting algorithm to evaluate risk of COVID-19 infection (ie, presence of at least 1 confirmed COVID-19 resident) in NHs.

SETTING AND PARTICIPANTS : The model was trained on outcomes from 1146 NHs in Massachusetts, Georgia, and New Jersey, reporting COVID-19 case data on April 20, 2020. Risk indices generated from the model using data from May 4 were prospectively validated against outcomes reported on May 11 from 1021 NHs in California.

METHODS : Model features, pertaining to facility and community characteristics, were obtained from a self-constructed dataset based on multiple public and private sources. The model was assessed via out-of-sample area under the receiver operating characteristic curve (AUC), sensitivity, and specificity in the training (via 10-fold cross-validation) and validation datasets.

RESULTS : The mean AUC, sensitivity, and specificity of the model over 10-fold cross-validation were 0.729 [95% confidence interval (CI) 0.690‒0.767], 0.670 (95% CI 0.477‒0.862), and 0.611 (95% CI 0.412‒0.809), respectively. Prospective out-of-sample validation yielded similar performance measures (AUC 0.721; sensitivity 0.622; specificity 0.713). The strongest predictors of COVID-19 infection were identified as the NH's county's infection rate and the number of separate units in the NH; other predictors included the county's population density, historical Centers of Medicare and Medicaid Services cited health deficiencies, and the NH's resident density (in persons per 1000 square feet). In addition, the NH's historical percentage of non-Hispanic white residents was identified as a protective factor.

CONCLUSIONS AND IMPLICATIONS : A machine-learning model can help quantify and predict NH infection risk. The identified risk factors support the early identification and management of presymptomatic and asymptomatic individuals (eg, staff) entering the NH from the surrounding community and the development of financially sustainable staff testing initiatives in preventing COVID-19 infection.

Sun Christopher L F, Zuccarelli Eugenio, Zerhouni El Ghali A, Lee Jason, Muller James, Scott Karen M, Lujan Alida M, Levi Retsef


COVID-19, Nursing homes, health policy, infection prevention, long-term care facility, machine-learning, risk modeling

General General

Mitochondria under the spotlight: On the implications of mitochondrial dysfunction and its connectivity to neuropsychiatric disorders.

In Computational and structural biotechnology journal

Neuropsychiatric disorders (NPDs) such as bipolar disorder (BD), schizophrenia (SZ) and mood disorder (MD) are hard to manage due to overlapping symptoms and lack of biomarkers. Risk alleles of BD/SZ/MD are emerging, with evidence suggesting mitochondrial (mt) dysfunction as a critical factor for disease onset and progression. Mood stabilizing treatments for these disorders are scarce, revealing the need for biomarker discovery and artificial intelligence approaches to design synthetically accessible novel therapeutics. Here, we show mt involvement in NPDs by associating 245 mt proteins to BD/SZ/MD, with 7 common players in these disease categories. Analysis of over 650 publications suggests that 245 NPD-linked mt proteins are associated with 800 other mt proteins, with mt impairment likely to rewire these interactions. High dosage of mood stabilizers is known to alleviate manic episodes, but which compounds target mt pathways is another gap in the field that we address through mood stabilizer-gene interaction analysis of 37 prescriptions and over-the-counter psychotropic treatments, which we have refined to 15 mood-stabilizing agents. We show 26 of the 245 NPD-linked mt proteins are uniquely or commonly targeted by one or more of these mood stabilizers. Further, induced pluripotent stem cell-derived patient neurons and three-dimensional human brain organoids as reliable BD/SZ/MD models are outlined, along with multiomics methods and machine learning-based decision making tools for biomarker discovery, which remains a bottleneck for precision psychiatry medicine.

Zilocchi Mara, Broderick Kirsten, Phanse Sadhna, Aly Khaled A, Babu Mohan


Artificial intelligence, Interactomics, Mitochondria, Precision Psychiatry Medicine, Psychiatric disorders, Psychotropic medication

General General

RF-PCA: A New Solution for Rapid Identification of Breast Cancer Categorical Data Based on Attribute Selection and Feature Extraction.

In Frontiers in genetics ; h5-index 62.0

Breast cancer is one of the most common cancer diseases in women. The rapid and accurate diagnosis of breast cancer is of great significance for the treatment of cancer. Artificial intelligence and machine learning algorithms are used to identify breast malignant tumors, which can effectively solve the problems of insufficient recognition accuracy and long time-consuming in traditional breast cancer diagnosis methods. To solve these problems, we proposed a method of attribute selection and feature extraction based on random forest (RF) combined with principal component analysis (PCA) for rapid and accurate diagnosis of breast cancer. Firstly, RF was used to reduce 30 attributes of breast cancer categorical data. According to the average importance of attributes and out of bag error, 21 relatively important attribute data were selected for feature extraction based on PCA. The seven features extracted from PCA were used to establish an extreme learning machine (ELM) classification model with different activation functions. By comparing the classification accuracy and training time of these different models, the activation function of the hidden layer was determined as the sigmoid function. When the number of neurons in the hidden layer was 27, the accuracy of the test set was 98.75%, the accuracy of the training set was 99.06%, and the training time was only 0.0022 s. Finally, in order to verify the superiority of this method in breast cancer diagnosis, we compared with the ELM model based on the original breast cancer data and other intelligent classification algorithm models. The algorithm used in this article has a faster recognition time and a higher recognition accuracy than other algorithms. We also used the breast cancer data of breast tissue reactance features to verify the reliability of this method, and ideal results were obtained. The experimental results show that RF-PCA combined with ELM can significantly reduce the time required for the diagnosis of breast cancer, which has the ability of rapid and accurate identification of breast cancer and provides a theoretical basis for the intelligent diagnosis of breast cancer.

Bian Kai, Zhou Mengran, Hu Feng, Lai Wenhao


artificial intelligence, breast cancer, extreme learning machine, principal component analysis, random forest

General General

Pathway-Based Drug Response Prediction Using Similarity Identification in Gene Expression.

In Frontiers in genetics ; h5-index 62.0

Lapatinib and trastuzumab (Herceptin) are targeted therapies designed for patients with HER2+ breast tumors. Although these therapies improved survival rates of patients with this tumor type, not all the patients harboring HER2 amplification respond to these drugs. The NeoALTTO clinical trial was designed to test whether a higher response rate can be achieved by combining lapatinib and trastuzumab. Although the combination therapy showed almost double the response rate compared to the monotherapies, 40% of the patients did not respond to the treatment. In this study, we sought to identify biomarkers of HER2+ breast cancer patients' response to drugs relying on gene expression profiles of tumors. We show that univariate gene expression-based biomarkers are significant but weak predictors of drug response. We further show that pathway activities, estimated from gene expression patterns quantified using the recent transcriptional similarity coefficient (TSC) between the tumor samples, yield high predictive value for therapy response (concordance index >0.8, p < 0.05). Moreover, machine learning models, built using multiple algorithms including logistic regression, naive Bayes, random forest, k-nearest neighbor, and support vector machine, for predicting drug response in the NeoALTTO clinical trial, resulted in lower performance compared to our pathway-based approach. Our results indicate that transcriptional similarity of biological pathways can be used to predict lapatinib and trastuzumab response in HER2+ breast cancer.

Madani Tonekaboni Seyed Ali, Beri Gangesh, Haibe-Kains Benjamin


breast cancer, estrogen receptor, human epidermal growth factor receptor 2, lapatinib, transcriptional similarity coefficient, trastuzumab