Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

DeepHE: Accurately predicting human essential genes based on deep learning.

In PLoS computational biology

Accurately predicting essential genes using computational methods can greatly reduce the effort in finding them via wet experiments at both time and resource scales, and further accelerate the process of drug discovery. Several computational methods have been proposed for predicting essential genes in model organisms by integrating multiple biological data sources either via centrality measures or machine learning based methods. However, the methods aiming to predict human essential genes are still limited and the performance still need improve. In addition, most of the machine learning based essential gene prediction methods are lack of skills to handle the imbalanced learning issue inherent in the essential gene prediction problem, which might be one factor affecting their performance. We propose a deep learning based method, DeepHE, to predict human essential genes by integrating features derived from sequence data and protein-protein interaction (PPI) network. A deep learning based network embedding method is utilized to automatically learn features from PPI network. In addition, 89 sequence features were derived from DNA sequence and protein sequence for each gene. These two types of features are integrated to train a multilayer neural network. A cost-sensitive technique is used to address the imbalanced learning problem when training the deep neural network. The experimental results for predicting human essential genes show that our proposed method, DeepHE, can accurately predict human gene essentiality with an average performance of AUC higher than 94%, the area under precision-recall curve (AP) higher than 90%, and the accuracy higher than 90%. We also compare DeepHE with several widely used traditional machine learning models (SVM, Naïve Bayes, Random Forest, and Adaboost) using the same features and utilizing the same cost-sensitive technique to against the imbalanced learning issue. The experimental results show that DeepHE significantly outperforms the compared machine learning models. We have demonstrated that human essential genes can be accurately predicted by designing effective machine learning algorithm and integrating representative features captured from available biological data. The proposed deep learning framework is effective for such task.

Zhang Xue, Xiao Wangxin, Xiao Weijia


General General

Robust Single-Trial EEG-Based Authentication Achieved with a 2-Stage Classifier.

In Biosensors

The risk of personal data exposure through unauthorized access has never been as imminent as today. To counter this, biometric authentication has been proposed: the use of distinctive physiological and behavioral characteristics as a form of identification and access control. One of the recent developments is electroencephalography (EEG)-based authentication. It builds on the subject-specific nature of brain responses which are difficult to recreate artificially. We propose an authentication system based on EEG signals recorded in response to a simple motor paradigm. Authentication is achieved with a novel two-stage decoder. In the first stage, EEG signal features are extracted using an inception- and a VGG-like deep learning neural network (NN) both of which we compare with principal component analysis (PCA). In the second stage, a support vector machine (SVM) is used for binary classification to authenticate the subject based on the extracted features. All decoders are trained on EEG motor-movement data recorded from 105 subjects. We achieved with the VGG-like NN-SVM decoder a false-acceptance rate (FAR) of 2.55% with an overall accuracy of 88.29%, a FAR of 3.33% with an accuracy of 87.47%, and a FAR of 2.89% with an accuracy of 90.68% for 8, 16, and 64 channels, respectively. With the Inception-like NN-SVM decoder we achieved a false-acceptance rate (FAR) of 4.08% with an overall accuracy of 87.29%, a FAR of 3.53% with an accuracy of 85.31%, and a FAR of 1.27% with an accuracy of 93.40% for 8, 16, and 64 channels, respectively. The PCA-SVM decoder achieved accuracies of 92.09%, 92.36%, and 95.64% with FARs of 2.19%, 2.17%, and 1.26% for 8, 16, and 64 channels, respectively.

Barayeu Uladzislau, Horlava Nastassya, Libert Arno, Van Hulle Marc


EEG, SVM, neural network, person authentication

Pathology Pathology

Graph representation forecasting of patient's medical conditions: towards a digital twin

ArXiv Preprint

Objective: Modern medicine needs to shift from a wait and react, curative discipline to a preventative, interdisciplinary science aiming at providing personalised, systemic and precise treatment plans to patients. The aim of this work is to present how the integration of machine learning approaches with mechanistic computational modelling could yield a reliable infrastructure to run probabilistic simulations where the entire organism is considered as a whole. Methods: We propose a general framework that composes advanced AI approaches and integrates mathematical modelling in order to provide a panoramic view over current and future physiological conditions. The proposed architecture is based on a graph neural network (GNNs) forecasting clinically relevant endpoints (such as blood pressure) and a generative adversarial network (GANs) providing a proof of concept of transcriptomic integrability. Results: We show the results of the investigation of pathological effects of overexpression of ACE2 across different signalling pathways in multiple tissues on cardiovascular functions. We provide a proof of concept of integrating a large set of composable clinical models using molecular data to drive local and global clinical parameters and derive future trajectories representing the evolution of the physiological state of the patient. Significance: We argue that the graph representation of a computational patient has potential to solve important technological challenges in integrating multiscale computational modelling with AI. We believe that this work represents a step forward towards a healthcare digital twin.

Pietro Barbiero, Ramon Viñas Torné, Pietro Lió


General General

Bayesian reasoning machine on a magneto-tunneling junction network.

In Nanotechnology

The recent trend in adapting ultra-energy-efficient (but error-prone) nanomagnetic devices to non-Boolean computing and information processing (e.g. stochastic/probabilistic computing, neuromorphic, belief networks, etc) has resulted in rapid strides in new computing modalities. Of particular interest are Bayesian networks (BN) which may see revolutionary advances when adapted to a specific type of nanomagnetic devices. Here, we develop a novel nanomagnet-based computing substrate for BN that allows high-speed sampling from an arbitrary Bayesian graph. We show that magneto-tunneling junctions (MTJs) can be used for electrically programmable 'sub-nanosecond' probability sample generation by co-optimizing voltage-controlled magnetic anisotropy and spin transfer torque. We also discuss that just by engineering local magnetostriction in the soft layers of MTJs, one can stochastically couple them for programmable conditional sample generation as well. This obviates the need for extensive energy-inefficient hardware like OP-AMPS, gates, shift-registers, etc to generate the correlations. Based on the above findings, we present an architectural design and computation flow of the MTJ network to map an arbitrary Bayesian graph where we develop circuits to program and induce switching and interactions among MTJs. Our discussed framework can lead to a new generation of stochastic computing hardware for various other computing models, such as stochastic programming and Bayesian deep learning. This can spawn a novel genre of ultra-energy-efficient, extremely powerful computing paradigms, which is a transformational advance.

Nasrin Shamma, Drobitch Justine, Shukla Priyesh, Tulabandhula Theja, Bandyopadhyay Supriyo, Trivedi Amit Ranjan


General General

Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit during COVID-19: An Observational Study.

In Journal of medical Internet research ; h5-index 88.0

BACKGROUND : The COVID-19 pandemic is exerting a devastating impact on mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit.

OBJECTIVE : We leverage natural language processing (NLP) with the goal of characterizing changes in fifteen of the world's largest mental health support groups (e.g., r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with eleven non-mental health groups (e.g., r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic.

METHODS : We create and release the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyze trends from 90 text-derived features such as sentiment analysis, personal pronouns, and a "guns" semantic category. Using supervised machine learning, we classify posts into their respective support group and interpret important features to understand how different problems manifest in language. We apply unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic.

RESULTS : We find that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately two months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories "economic stress", "isolation", and "home" while others such as "motion" significantly decreased. We find that support groups related to attention deficit hyperactivity disorder (ADHD), eating disorders (ED), and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discover that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ = -0.96, P<.001). Using unsupervised clustering, we find the Suicidality and Loneliness clusters more than doubled in amount of posts during the pandemic. Specifically, the support groups for borderline personality disorder and post-traumatic stress disorder became significantly associated with the Suicidality cluster. Furthermore, clusters surrounding Self-Harm and Entertainment emerged.

CONCLUSIONS : By using a broad set of NLP techniques and analyzing a baseline of pre-pandemic posts, we uncover patterns of how specific mental health problems manifest in language, identify at-risk users, and reveal the distribution of concerns across Reddit which could help provide better resources to its millions of users. We then demonstrate that textual analysis is sensitive to uncover mental health complaints as they arise in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests from the present or the past.


Low Daniel M, Rumker Laurie, Talker Tanya, Torous John, Cecchi Guillermo, Ghosh Satrajit S


General General

Identification of risk factors and symptoms of SARS-CoV-2 (COVID-19) using biomedical literature and social media data: Integrative and Consensus study.

In Journal of medical Internet research ; h5-index 88.0

BACKGROUND : In December 2019, Coronavirus disease 2019 (COVID-19) outbreak started in China and rapidly spread around the world. Lack of any vaccine or optimized intervention raised the importance of characterizing risk factors and symptoms for the early identification and successful treatment of COVID-19 patients.

OBJECTIVE : This study aims to investigate and analyze biomedical literature and public social media data to understand the association of risk factors and symptoms with various outcomes of COVID-19 patients.

METHODS : Through semantic analysis, we collected 45 retrospective cohort studies, which evaluated 303 clinical and demographic variables across 13 different outcomes of COVID-19 patients, and 84,140 Twitter posts from 1,036 COVID-19 positive users. Machine-learning tools to extract biomedical information were introduced to identify uncommon or novel symptoms mentioning in social media. We then examined and compared two datasets to expand our landscape of risk factors and symptoms related to COVID-19.

RESULTS : From the biomedical literature, approximately 90% of clinical and demographic variables showed inconsistent associations with COVID-19 outcomes. Consensus analysis identified 72 risk factors that were specifically associated with individual outcomes. From the social media data, 51 symptoms were characterized and analyzed. By comparing social media data with biomedical literature, we identified 25 novel symptoms that were specifically mentioned in social media but have been not previously well characterized. Furthermore, there were certain combinations of symptoms that were frequently mentioned together in social media.

CONCLUSIONS : Identified outcome-specific risk factors, symptoms, and combinations of symptoms may serve as surrogate indicators to identify COVID-19 patients and predict their clinical outcomes providing appropriate treatments.


Jeon Jouhyun, Baruah Gaurav, Sarabadani Sarah, Palanica Adam