Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Assessing Public Opinion on CRISPR-Cas9: Combining Crowdsourcing and Deep Learning.

In Journal of medical Internet research ; h5-index 88.0

BACKGROUND : The discovery of the CRISPR-Cas9-based gene editing method has opened unprecedented new potential for biological and medical engineering, sparking a growing public debate on both the potential and dangers of CRISPR applications. Given the speed of technology development and the almost instantaneous global spread of news, it is important to follow evolving debates without much delay and in sufficient detail, as certain events may have a major long-term impact on public opinion and later influence policy decisions.

OBJECTIVE : Social media networks such as Twitter have shown to be major drivers of news dissemination and public discourse. They provide a vast amount of semistructured data in almost real-time and give direct access to the content of the conversations. We can now mine and analyze such data quickly because of recent developments in machine learning and natural language processing.

METHODS : Here, we used Bidirectional Encoder Representations from Transformers (BERT), an attention-based transformer model, in combination with statistical methods to analyze the entirety of all tweets ever published on CRISPR since the publication of the first gene editing application in 2013.

RESULTS : We show that the mean sentiment of tweets was initially very positive, but began to decrease over time, and that this decline was driven by rare peaks of strong negative sentiments. Due to the high temporal resolution of the data, we were able to associate these peaks with specific events and to observe how trending topics changed over time.

CONCLUSIONS : Overall, this type of analysis can provide valuable and complementary insights into ongoing public debates, extending the traditional empirical bioethics toolset.

Müller Martin, Schneider Manuel, Salathé Marcel, Vayena Effy


CRISPR, digital methods, empirical bioethics, infodemiology, infoveillace, natural language processing, sentiment analysis, social media

General General

Using Dual Neural Network Architecture to Detect the Risk of Dementia With Community Health Data: Algorithm Development and Validation Study.

In JMIR medical informatics ; h5-index 23.0

BACKGROUND : Recent studies have revealed lifestyle behavioral risk factors that can be modified to reduce the risk of dementia. As modification of lifestyle takes time, early identification of people with high dementia risk is important for timely intervention and support. As cognitive impairment is a diagnostic criterion of dementia, cognitive assessment tools are used in primary care to screen for clinically unevaluated cases. Among them, Mini-Mental State Examination (MMSE) is a very common instrument. However, MMSE is a questionnaire that is administered when symptoms of memory decline have occurred. Early administration at the asymptomatic stage and repeated measurements would lead to a practice effect that degrades the effectiveness of MMSE when it is used at later stages.

OBJECTIVE : The aim of this study was to exploit machine learning techniques to assist health care professionals in detecting high-risk individuals by predicting the results of MMSE using elderly health data collected from community-based primary care services.

METHODS : A health data set of 2299 samples was adopted in the study. The input data were divided into two groups of different characteristics (ie, client profile data and health assessment data). The predictive output was the result of two-class classification of the normal and high-risk cases that were defined based on MMSE. A dual neural network (DNN) model was proposed to obtain the latent representations of the two groups of input data separately, which were then concatenated for the two-class classification. Mean and k-nearest neighbor were used separately to tackle missing data, whereas a cost-sensitive learning (CSL) algorithm was proposed to deal with class imbalance. The performance of the DNN was evaluated by comparing it with that of conventional machine learning methods.

RESULTS : A total of 16 predictive models were built using the elderly health data set. Among them, the proposed DNN with CSL outperformed in the detection of high-risk cases. The area under the receiver operating characteristic curve, average precision, sensitivity, and specificity reached 0.84, 0.88, 0.73, and 0.80, respectively.

CONCLUSIONS : The proposed method has the potential to serve as a tool to screen for elderly people with cognitive impairment and predict high-risk cases of dementia at the asymptomatic stage, providing health care professionals with early signals that can prompt suggestions for a follow-up or a detailed diagnosis.

Shen Xiao, Wang Guanjin, Kwan Rick Yiu-Cho, Choi Kup-Sze


cognitive screening, dementia risk, dual neural network, predictive models, primary care

General General

A Comparison Of Scaling Methods To Obtain Calibrated Probabilities Of Activity For Protein-Ligand Predictions.

In Journal of chemical information and modeling

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into a probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling (PS), Isotonic Regression (IR) and Venn-ABERS Predictors (VA) in calibrating prediction scores obtained from ligand-target prediction comprising the Naïve Bayes (NB), Support Vector Machines (SVMs) and Random Forest (RF) algorithms. Calibration quality was assessed on bioactivity data available at AstraZeneca for 40 million data points (compound-target pairs) across 2,112 targets and performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation. VA achieved the best calibration performances across all machine learning algorithms and cross validation methods tested, and also the lowest (best) Brier score loss (mean squared difference between the outputted probability estimates assigned to a compound and the actual outcome). In comparison, the PS and IR methods can actually degrade the assigned probability estimates, particularly for the RF for SSS and during L20SO. Sphere Exclusion (SE), a method to sample additional (putative) inactive compounds, was shown to inflate the overall Brier score loss performance, through the artificial requirement for inactive molecules to be dissimilar to active compounds, but was shown to result in over-confident estimators. VA was able to successfully calibrate the probability estimates for even small calibration sets. The multi-probability values (lower and upper probability boundary intervals) were shown to produce large discordance for test set molecules that are neither very similar nor very dissimilar to the active training set, which were hence difficult to predict, suggesting that multi-probability discordance can be used as an estimate for target prediction uncertainty. Overall, we were able to show in this work that VA scaling of target prediction models is able to improve probability estimates in all testing instances, which is currently being applied for in-house target prediction models.

Mervin Lewis, Afzal Avid M, Engkvist Ola, Bender Andreas


General General

3D Convolutional Neural Networks and a CrossDocked Dataset for Structure-Based Drug Design.

In Journal of chemical information and modeling

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank, and perform a comprehensive evaluation of grid-based convolutional neural network (CNN) models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of five densely connected CNNs, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at

Francoeur Paul, Masuda Tomohide, Sunseri Jocelyn, Jia Andrew, Iovanisci Richard B, Snyder Ian, Koes David Ryan


General General

Dynamic organization of intracellular organelle networks.

In Wiley interdisciplinary reviews. Systems biology and medicine

Intracellular organelles are membrane-bound and biochemically distinct compartments constructed to serve specialized functions in eukaryotic cells. Through extensive interactions, they form networks to coordinate and integrate their specialized functions for cell physiology. A fundamental property of these organelle networks is that they constantly undergo dynamic organization via membrane fusion and fission to remodel their internal connections and to mediate direct material exchange between compartments. The dynamic organization not only enables them to serve critical physiological functions adaptively but also differentiates them from many other biological networks such as gene regulatory networks and cell signaling networks. This review examines this fundamental property of the organelle networks from a systems point of view. The focus is exclusively on homotypic networks formed by mitochondria, lysosomes, endosomes, and the endoplasmic reticulum, respectively. First, key mechanisms that drive the dynamic organization of these networks are summarized. Then, several distinct organizational properties of these networks are highlighted. Next, spatial properties of the dynamic organization of these networks are emphasized, and their functional implications are examined. Finally, some representative molecular machineries that mediate the dynamic organization of these networks are surveyed. Overall, the dynamic organization of intracellular organelle networks is emerging as a fundamental and unifying paradigm in the internal organization of eukaryotic cells. This article is categorized under: Models of Systems Properties and Processes > Cellular Models Analytical and Computational Methods > Computational Methods Laboratory Methods and Technologies > Macromolecular Interactions, Methods Metabolic Diseases > Molecular and Cellular Physiology.

Li Wenjing, Zhang Shuhao, Yang Ge


dynamic organization, intracellular organelle network, molecular machinery, organelle interaction, systems modeling

General General

Activity prediction of aminoquinoline drugs based on deep learning.

In Biotechnology and applied biochemistry

The results of the traditional prediction method for the activity of aminoquinoline drugs are inaccurate, so the prediction method for the activity of aminoquinoline drugs based on the deep learning is designed. The molecular holographic distance vector method was used to describe the molecular structure of 40 aminoquinoline compounds, and the principal component regression method was used for modeling and quantitative analysis. Two methods were used to predict the activity of aminoquinoline drugs. The correlation coefficients of the results obtained from the two sets of activity data and the cross test were 0.9438 and 0.9737, and 0.8305 and 0.9098, respectively. Our data suggested  that method for the activity prediction of aminoquinoline drugs based on deep learning studied in this paper can better predict the activity of aminoquinoline drugs and provide a strong basis for the activity prediction of aminoquinoline drugs. This article is protected by copyright. All rights reserved.

Wang Wenle, Chen Jinquan, Zhu Yujie, Feng Feng


activity, aminoquinoline, deep learning, prediction