Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Modularizing Deep Learning via Pairwise Learning With Kernels.

In IEEE transactions on neural networks and learning systems

By redefining the conventional notions of layers, we present an alternative view on finitely wide, fully trainable deep neural networks as stacked linear models in feature spaces, leading to a kernel machine interpretation. Based on this construction, we then propose a provably optimal modular learning framework for classification that does not require between-module backpropagation. This modular approach brings new insights into the label requirement of deep learning (DL). It leverages only implicit pairwise labels (weak supervision) when learning the hidden modules. When training the output module, on the other hand, it requires full supervision but achieves high label efficiency, needing as few as ten randomly selected labeled examples (one from each class) to achieve 94.88% accuracy on CIFAR-10 using a ResNet-18 backbone. Moreover, modular training enables fully modularized DL workflows, which then simplify the design and implementation of pipelines and improve the maintainability and reusability of models. To showcase the advantages of such a modularized workflow, we describe a simple yet reliable method for estimating reusability of pretrained modules as well as task transferability in a transfer learning setting. At practically no computation overhead, it precisely described the task space structure of 15 binary classification tasks from CIFAR-10.

Duan Shiyu, Yu Shujian, Principe Jose C

2021-Jan-05

General General

Topographic brain tumor anatomy drives seizure risk and enables machine learning based prediction.

In NeuroImage. Clinical

OBJECTIVE : The aim of this study was to identify relevant risk factors for epileptic seizures upon initial diagnosis of a brain tumor and to develop and validate a machine learning based prediction to allow for a tailored risk-based antiepileptic therapy.

METHODS : Clinical, electrophysiological and high-resolution imaging data was obtained from a consecutive cohort of 1051 patients with newly diagnosed brain tumors. Factor-associated seizure risk difference allowed to determine the relevance of specific topographic, demographic and histopathologic variables available at the time of diagnosis for seizure risk. The data was divided in a 70/30 ratio into a training and test set. Different machine learning based predictive models were evaluated before a generalized additive model (GAM) was selected considering its traceability while maintaining high performance. Based on a clinical stratification of the risk factors, three different GAM were trained and internally validated.

RESULTS : A total of 923 patients had full data and were included. Specific topographic anatomical patterns that drive seizure risk could be identified. The involvement of allopallial, mesopallial or primary motor/somatosensory neopallial structures by brain tumors results in a significant and clinically relevant increase in seizure risk. While topographic input was most relevant for the GAM, the best prediction was achieved by a combination of topographic, demographic and histopathologic information (Validation: AUC: 0.79, Accuracy: 0.72, Sensitivity: 0.81, Specificity: 0.66).

CONCLUSIONS : This study identifies specific phylogenetic anatomical patterns as epileptic drivers. A GAM allowed the prediction of seizure risk using topographic, demographic and histopathologic data achieving fair performance while maintaining transparency.

Akeret Kevin, Stumpo Vittorio, Staartjes Victor E, Vasella Flavio, Velz Julia, Marinoni Federica, Dufour Jean-Philippe, Imbach Lukas L, Regli Luca, Serra Carlo, Krayenb├╝hl Niklaus

2020

Epilepsy, Generalized additive model, Glioma, Metastases, Primary central nervous system lymphoma

General General

Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing.

In bioRxiv : the preprint server for biology

Background : Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events.

Methods : We developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to each of 69,686 SARS-CoV-2 complete, high-quality genomes available from GISAID as of October 20 th 2020. STs were then clustered into clonal complexes (CCs), and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events, and to estimate effective viral diversity across locations and over time in 16 US states.

Results : GNUVID is a scalable tool for viral genotype classification (available at https://github.com/ahmedmagds/GNUVID ) that can be used to quickly process tens of thousands of genomes. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states. We detected an average of 20.6 putative introductions and 7.5 exportations for each state. Effective viral diversity dropped in all states as shelter-in-place travel-restrictions went into effect and increased as restrictions were lifted. Interestingly, our analysis showed correlation between effective diversity and the date that state-wide mask mandates were imposed.

Conclusions : Our classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. Combined with future genomic sampling the GNUVID system could be used to track circulating viral diversity and identify emerging clones and hotspots.

Moustafa Ahmed M, Planet Paul J

2020-Dec-28

General General

A Graph Gaussian Embedding Method for Predicting Alzheimer's Disease Progression with MEG Brain Networks.

In IEEE transactions on bio-medical engineering

Characterizing the subtle changes of functional brain networks associated with the pathological cascade of Alzheimer's disease (AD) is important for early diagnosis and prediction of disease progression prior to clinical symptoms. We developed a new deep learning method, termed multiple graph Gaussian embedding model (MG2G), which can learn highly informative network features by mapping high-dimensional resting-state brain networks into a low-dimensional latent space. These latent distribution-based embeddings enable a quantitative characterization of subtle and heterogeneous brain connectivity patterns at different regions, and can be used as input to traditional classifiers for various downstream graph analytic tasks, such as AD early stage prediction, and statistical evaluation of between-group significant alterations across brain regions. We used MG2G to detect the intrinsic latent dimensionality of MEG brain networks, predict the progression of patients with mild cognitive impairment (MCI) to AD, and identify brain regions with network alterations related to MCI.

Xu Mengjia, Sanz David Lopez, Garces Pilar, Maestu Fernando, Li Quanzheng, Pantazis Dimitrios

2021-Jan-05

General General

Physical Activity and Psychological Stress Detection and Assessment of Their Effects on Glucose Concentration Predictions in Diabetes Management.

In IEEE transactions on bio-medical engineering

Continuous glucose monitoring (CGM) enables improvements in diabetes treatment by providing frequent temporal information on glycemia, and prediction of future glucose concentration (GC) trends. The accurate prediction of the future GC trajectory is important for making meal, activity and insulin dosing decisions. Glucose concentration values are affected by various physiological and metabolic variations, such as physical activity (PA) and acute psychological stress (APS), in addition to meals and insulin. In this work, we extend our adaptive glucose modeling framework to incorporate the effects of PA and APS on the GC predictions by integrating input features derived from supplemental physiological variables measured from a wearable device. We use a wristband that is conducive of use by free-living ambulatory people. The readily obtained biosignals are used to generate new quantifiable input features for PA and APS. Machine learning techniques are used to estimate the type and intensity of the PA and APS when they occur individually and concurrently. Variables quantifying the characteristics of both PA and APS are integrated for the first time as exogenous inputs in an adaptive system identification technique for enhancing the accuracy of GC predictions. Data from clinical experiments are used to illustrate the improvement in GC prediction accuracy. The average mean absolute error (MAE) of one-hour-ahead GC predictions decreases from 35.1 to 31.9 mg/dL (p-value=0.01) for testing data with the inclusion of PA information. The average MAE of one-hour-ahead GC predictions decreases from 16.9 to 14.2 mg/dL (p-value=0.006) for testing data with the inclusion of PA and APS information.

Sevil Mert, Rashid Mudassir, Hajizadeh Iman, Park Minsun, Quinn Laurie, Cinar Ali

2021-Jan-05

General General

Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words.

In Psychological science ; h5-index 93.0

Stereotypes are associations between social groups and semantic attributes that are widely shared within societies. The spoken and written language of a society affords a unique way to measure the magnitude and prevalence of these widely shared collective representations. Here, we used word embeddings to systematically quantify gender stereotypes in language corpora that are unprecedented in size (65+ million words) and scope (child and adult conversations, books, movies, TV). Across corpora, gender stereotypes emerged consistently and robustly for both theoretically selected stereotypes (e.g., work-home) and comprehensive lists of more than 600 personality traits and more than 300 occupations. Despite underlying differences across language corpora (e.g., time periods, formats, age groups), results revealed the pervasiveness of gender stereotypes in every corpus. Using gender stereotypes as the focal issue, we unite 19th-century theories of collective representations and 21st-century evidence on implicit social cognition to understand the subtle yet persistent presence of collective representations in language.

Charlesworth Tessa E S, Yang Victor, Mann Thomas C, Kurdi Benedek, Banaji Mahzarin R

2021-Jan-05

collective representations, gender stereotypes, machine learning, natural-language processing, open data, open materials, word embeddings