Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

oncology Oncology

Genome-wide Modeling of Polygenic Risk Score in Colorectal Cancer Risk.

In American journal of human genetics

Accurate colorectal cancer (CRC) risk prediction models are critical for identifying individuals at low and high risk of developing CRC, as they can then be offered targeted screening and interventions to address their risks of developing disease (if they are in a high-risk group) and avoid unnecessary screening and interventions (if they are in a low-risk group). As it is likely that thousands of genetic variants contribute to CRC risk, it is clinically important to investigate whether these genetic variants can be used jointly for CRC risk prediction. In this paper, we derived and compared different approaches to generating predictive polygenic risk scores (PRS) from genome-wide association studies (GWASs) including 55,105 CRC-affected case subjects and 65,079 control subjects of European ancestry. We built the PRS in three ways, using (1) 140 previously identified and validated CRC loci; (2) SNP selection based on linkage disequilibrium (LD) clumping followed by machine-learning approaches; and (3) LDpred, a Bayesian approach for genome-wide risk prediction. We tested the PRS in an independent cohort of 101,987 individuals with 1,699 CRC-affected case subjects. The discriminatory accuracy, calculated by the age- and sex-adjusted area under the receiver operating characteristics curve (AUC), was highest for the LDpred-derived PRS (AUC = 0.654) including nearly 1.2 M genetic variants (the proportion of causal genetic variants for CRC assumed to be 0.003), whereas the PRS of the 140 known variants identified from GWASs had the lowest AUC (AUC = 0.629). Based on the LDpred-derived PRS, we are able to identify 30% of individuals without a family history as having risk for CRC similar to those with a family history of CRC, whereas the PRS based on known GWAS variants identified only top 10% as having a similar relative risk. About 90% of these individuals have no family history and would have been considered average risk under current screening guidelines, but might benefit from earlier screening. The developed PRS offers a way for risk-stratified CRC screening and other targeted interventions.

Thomas Minta, Sakoda Lori C, Hoffmeister Michael, Rosenthal Elisabeth A, Lee Jeffrey K, van Duijnhoven Franzel J B, Platz Elizabeth A, Wu Anna H, Dampier Christopher H, de la Chapelle Albert, Wolk Alicja, Joshi Amit D, Burnett-Hartman Andrea, Gsur Andrea, Lindblom Annika, Castells Antoni, Win Aung Ko, Namjou Bahram, Van Guelpen Bethany, Tangen Catherine M, He Qianchuan, Li Christopher I, Schafmayer Clemens, Joshu Corinne E, Ulrich Cornelia M, Bishop D Timothy, Buchanan Daniel D, Schaid Daniel, Drew David A, Muller David C, Duggan David, Crosslin David R, Albanes Demetrius, Giovannucci Edward L, Larson Eric, Qu Flora, Mentch Frank, Giles Graham G, Hakonarson Hakon, Hampel Heather, Stanaway Ian B, Figueiredo Jane C, Huyghe Jeroen R, Minnier Jessica, Chang-Claude Jenny, Hampe Jochen, Harley John B, Visvanathan Kala, Curtis Keith R, Offit Kenneth, Li Li, Le Marchand Loic, Vodickova Ludmila, Gunter Marc J, Jenkins Mark A, Slattery Martha L, Lemire Mathieu, Woods Michael O, Song Mingyang, Murphy Neil, Lindor Noralane M, Dikilitas Ozan, Pharoah Paul D P, Campbell Peter T, Newcomb Polly A, Milne Roger L, MacInnis Robert J, Castellví-Bel Sergi, Ogino Shuji, Berndt Sonja I, Bézieau Stéphane, Thibodeau Stephen N, Gallinger Steven J, Zaidi Syed H, Harrison Tabitha A, Keku Temitope O, Hudson Thomas J, Vymetalkova Veronika, Moreno Victor, Martín Vicente, Arndt Volker, Wei Wei-Qi, Chung Wendy, Su Yu-Ru, Hayes Richard B, White Emily, Vodicka Pavel, Casey Graham, Gruber Stephen B, Schoen Robert E, Chan Andrew T, Potter John D, Brenner Hermann, Jarvik Gail P, Corley Douglas A, Peters Ulrike, Hsu Li


cancer risk prediction, colorectal cancer, machine learning, polygenic risk score

General General

Recent advancement in cancer detection using machine learning: Systematic survey of decades, comparisons and challenges.

In Journal of infection and public health

Cancer is a fatal illness often caused by genetic disorder aggregation and a variety of pathological changes. Cancerous cells are abnormal areas often growing in any part of human body that are life-threatening. Cancer also known as tumor must be quickly and correctly detected in the initial stage to identify what might be beneficial for its cure. Even though modality has different considerations, such as complicated history, improper diagnostics and treatement that are main causes of deaths. The aim of the research is to analyze, review, categorize and address the current developments of human body cancer detection using machine learning techniques for breast, brain, lung, liver, skin cancer leukemia. The study highlights how cancer diagnosis, cure process is assisted using machine learning with supervised, unsupervised and deep learning techniques. Several state of art techniques are categorized under the same cluster and results are compared on benchmark datasets from accuracy, sensitivity, specificity, false-positive metrics. Finally, challenges are also highlighted for possible future work.

Saba Tanzila


Cancer, Health systems, Image analysis, Life expectancy, Machine learning

General General

A machine-vision approach for automated pain measurement at millisecond timescales.

In eLife

Objective and automatic measurement of pain in mice remains a barrier for discovery in neuroscience. Here we capture paw kinematics during pain behavior in mice with high-speed videography and automated paw tracking with machine and deep learning approaches. Our statistical software platform, PAWS (Pain Assessment at Withdrawal Speeds), uses a univariate projection of paw position over time to automatically quantify seven behavioral features that are combined into a single, univariate pain score. Automated paw tracking combined with PAWS reveals a behaviorally-divergent mouse strain that displays hyper-sensitivity to mechanical stimuli. To demonstrate the efficacy of PAWS for detecting spinally- versus centrally-mediated behavioral responses, we chemogenetically activated nociceptive neurons in the amygdala, which further separated the pain-related behavioral features and the resulting pain score. Taken together, this automated pain quantification approach will increase objectivity in collecting rigorous behavioral data, and it is compatible with other neural circuit dissection tools for determining the mouse pain state.

Jones Jessica M, Foster William, Twomey Colin, Burdge Justin, Ahmed Osama, Pereira Talmo D, Wojick Jessica A, Corder Gregory, Plotkin Joshua B, Abdus-Saboor Ishmail


mouse, neuroscience

Internal Medicine Internal Medicine

Understanding Public Perception of COVID-19 Social Distancing on Twitter.

In Infection control and hospital epidemiology ; h5-index 48.0

OBJECTIVE : Social distancing policies are key in curtailing COVID-19 infection spread, but their effectiveness is heavily contingent on public understanding and collective adherence. We sought to study public perception of social distancing through organic, large-scale discussion on Twitter.

DESIGN : Retrospective cross-sectional study.

METHODS : Between March 27 and April 10, 2020, we retrieved English-only tweets matching two trending social distancing hashtags, #socialdistancing and #stayathome. We analyzed the tweets using natural language processing and machine learning models, conducting a sentiment analysis to identify emotions and polarity. We evaluated subjectivity of tweets and estimated frequency of discussion of social distancing rules. We then identified clusters of discussion using topic modeling and associated sentiments.

RESULTS : We studied a sample of 574,903 tweets. For both hashtags, polarity was positive (mean, 0.148; SD, 0.290); only 15% of tweets had negative polarity. Tweets were more likely to be objective (median, 0.40; IQR, 0 to 0.6) with approximately 30% of tweets labeled as completely objective (labeled as 0 in range from 0 to 1). Approximately half (50.4%) of tweets primarily expressed joy and one-fifth expressed fear and surprise. Each correlated well with topic clusters identified by frequency including leisure and community support (i.e., joy), concerns about food insecurity and quarantine effects (i.e., fear), and unpredictability of COVID and its implications (i.e., surprise).

CONCLUSIONS : The positive sentiment, preponderance of objective tweets, and topics supporting coping mechanisms led us to believe that Twitter users generally supported social distancing in the early stages of their implementation.

Saleh Sameh N, Lehmann Christoph U, McDonald Samuel A, Basit Mujeeb A, Medford Richard J


oncology Oncology

The obesity paradox in critically ill patients: a causal learning approach to a casual finding.

In Critical care (London, England)

BACKGROUND : While obesity confers an increased risk of death in the general population, numerous studies have reported an association between obesity and improved survival among critically ill patients. This contrary finding has been referred to as the obesity paradox. In this retrospective study, two causal inference approaches were used to address whether the survival of non-obese critically ill patients would have been improved if they had been obese.

METHODS : The study cohort comprised 6557 adult critically ill patients hospitalized at the Intensive Care Unit of the Ghent University Hospital between 2015 and 2017. Obesity was defined as a body mass index of ≥ 30 kg/m2. Two causal inference approaches were used to estimate the average effect of obesity in the non-obese (AON): a traditional approach that used regression adjustment for confounding and that assumed missingness completely at random and a robust approach that used machine learning within the targeted maximum likelihood estimation framework along with multiple imputation of missing values under the assumption of missingness at random. 1754 (26.8%) patients were discarded in the traditional approach because of at least one missing value for obesity status or confounders.

RESULTS : Obesity was present in 18.9% of patients. The in-hospital mortality was 14.6% in non-obese patients and 13.5% in obese patients. The raw marginal risk difference for in-hospital mortality between obese and non-obese patients was - 1.06% (95% confidence interval (CI) - 3.23 to 1.11%, P = 0.337). The traditional approach resulted in an AON of - 2.48% (95% CI - 4.80 to - 0.15%, P = 0.037), whereas the robust approach yielded an AON of - 0.59% (95% CI - 2.77 to 1.60%, P = 0.599).

CONCLUSIONS : A causal inference approach that is robust to residual confounding bias due to model misspecification and selection bias due to missing (at random) data mitigates the obesity paradox observed in critically ill patients, whereas a traditional approach results in even more paradoxical findings. The robust approach does not provide evidence that the survival of non-obese critically ill patients would have been improved if they had been obese.

Decruyenaere Alexander, Steen Johan, Colpaert Kirsten, Benoit Dominique D, Decruyenaere Johan, Vansteelandt Stijn


Causality, Confounding, Machine learning, Mortality, Obesity, Paradox, Selection bias, Super learning, Targeted learning

Radiology Radiology

Radiomics for glioblastoma survival analysis in pre-operative MRI: exploring feature robustness, class boundaries, and machine learning techniques.

In Cancer imaging : the official publication of the International Cancer Imaging Society

BACKGROUND : This study aims to identify robust radiomic features for Magnetic Resonance Imaging (MRI), assess feature selection and machine learning methods for overall survival classification of Glioblastoma multiforme patients, and to robustify models trained on single-center data when applied to multi-center data.

METHODS : Tumor regions were automatically segmented on MRI data, and 8327 radiomic features extracted from these regions. Single-center data was perturbed to assess radiomic feature robustness, with over 16 million tests of typical perturbations. Robust features were selected based on the Intraclass Correlation Coefficient to measure agreement across perturbations. Feature selectors and machine learning methods were compared to classify overall survival. Models trained on single-center data (63 patients) were tested on multi-center data (76 patients). Priors using feature robustness and clinical knowledge were evaluated.

RESULTS : We observed a very large performance drop when applying models trained on single-center on unseen multi-center data, e.g. a decrease of the area under the receiver operating curve (AUC) of 0.56 for the overall survival classification boundary at 1 year. By using robust features alongside priors for two overall survival classes, the AUC drop could be reduced by 21.2%. In contrast, sensitivity was 12.19% lower when applying a prior.

CONCLUSIONS : Our experiments show that it is possible to attain improved levels of robustness and accuracy when models need to be applied to unseen multi-center data. The performance on multi-center data of models trained on single-center data can be increased by using robust features and introducing prior knowledge. For successful model robustification, tailoring perturbations for robustness testing to the target dataset is key.

Suter Yannick, Knecht Urspeter, Alão Mariana, Valenzuela Waldo, Hewer Ekkehard, Schucht Philippe, Wiest Roland, Reyes Mauricio


Glioblastoma multiforme, MRI radiomics, Multi-center, Overall survival classification, Robustness