Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Stochastic Mutual Information Gradient Estimation for Dimensionality Reduction Networks.

In Information sciences

Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection algorithms based on any criterion leading to potentially sub-optimal solutions for class separability. In that regard, we introduce emerging information theoretic feature transformation protocols as an end-to-end neural network training approach. We present a dimensionality reduction network (MMINet) training procedure based on the stochastic estimate of the mutual information gradient. The network projects high-dimensional features onto an output feature space where lower dimensional representations of features carry maximum mutual information with their associated class labels. Furthermore, we formulate the training objective to be estimated non-parametrically with no distributional assumptions. We experimentally evaluate our method with applications to high-dimensional biological data sets, and relate it to conventional feature selection algorithms to form a special case of our approach.

Özdenizci Ozan, Erdoğmuş Deniz


MMINet, dimensionality reduction, feature projection, information theoretic learning, mutual information, neural networks, stochastic gradient estimation

Public Health Public Health

Estimating monthly concentrations of ambient key air pollutants in Japan during 2010-2015 for a national-scale birth cohort.

In Environmental pollution (Barking, Essex : 1987)

Exposure to ambient air pollution is associated with maternal and child health. Some air pollutants exhibit similar behavior in the atmosphere, and some interact with each other; thus, comprehensive assessments of individual air pollutants are required. In this study, we developed national-scale monthly models for six air pollutants (NO, NO2, SO2, O3, PM2.5, and suspended particulate matter (SPM)) to obtain accurate estimates of pollutant concentrations at 1 km × 1 km resolution from 2010 through 2015 for application to the Japan Environment and Children's Study (JECS), which is a large-scale birth cohort study. We developed our models in the land use regression framework using random forests in conjunction with kriging. We evaluated the model performance via 5-fold location-based cross-validation. We successfully predicted monthly NO (r2 = 0.65), NO2 (r2 = 0.84), O3 (r2 = 0.86), PM2.5 (r2 = 0.79), and SPM (r2 = 0.64) concentrations. For SO2, a satisfactory model could not be developed (r2 = 0.45) because of the low SO2 concentrations in Japan. The performance of our models is comparable to those reported in previous studies at similar temporal and spatial scales. The model predictions in conjunction with the JECS will reveal the critical windows of prenatal and infancy exposure to ambient air pollutants, thus contributing to the development of environmental policies on air pollution.

Araki Shin, Hasunuma Hideki, Yamamoto Kouhei, Shima Masayuki, Michikawa Takehiro, Nitta Hiroshi, Nakayama Shoji F, Yamazaki Shin


Exposure assessment, Kriging, Machine learning, Random forests, Spatial distribution

oncology Oncology

Predicting the SARS-CoV-2 effective reproduction number using bulk contact data from mobile phones.

In Proceedings of the National Academy of Sciences of the United States of America

Over the last months, cases of SARS-CoV-2 surged repeatedly in many countries but could often be controlled with nonpharmaceutical interventions including social distancing. We analyzed deidentified Global Positioning System (GPS) tracking data from 1.15 to 1.4 million cell phones in Germany per day between March and November 2020 to identify encounters between individuals and statistically evaluate contact behavior. Using graph sampling theory, we estimated the contact index (CX), a metric for number and heterogeneity of contacts. We found that CX, and not the total number of contacts, is an accurate predictor for the effective reproduction number R derived from case numbers. A high correlation between CX and R recorded more than 2 wk later allows assessment of social behavior well before changes in case numbers become detectable. By construction, the CX quantifies the role of superspreading and permits assigning risks to specific contact behavior. We provide a critical CX value beyond which R is expected to rise above 1 and propose to use that value to leverage the social-distancing interventions for the coming months.

Rüdiger Sten, Konigorski Stefan, Rakowski Alexander, Edelman Jonathan Antonio, Zernick Detlef, Thieme Alexander, Lippert Christoph


COVID-19, epidemiology, network science

Radiology Radiology

Multiple Instance Learning with Auxiliary Task Weighting for Multiple Myeloma Classification

ArXiv Preprint

Whole body magnetic resonance imaging (WB-MRI) is the recommended modality for diagnosis of multiple myeloma (MM). WB-MRI is used to detect sites of disease across the entire skeletal system, but it requires significant expertise and is time-consuming to report due to the great number of images. To aid radiological reading, we propose an auxiliary task-based multiple instance learning approach (ATMIL) for MM classification with the ability to localize sites of disease. This approach is appealing as it only requires patient-level annotations where an attention mechanism is used to identify local regions with active disease. We borrow ideas from multi-task learning and define an auxiliary task with adaptive reweighting to support and improve learning efficiency in the presence of data scarcity. We validate our approach on both synthetic and real multi-center clinical data. We show that the MIL attention module provides a mechanism to localize bone regions while the adaptive reweighting of the auxiliary task considerably improves the performance.

Talha Qaiser, Stefan Winzeck, Theodore Barfoot, Tara Barwick, Simon J. Doran, Martin F. Kaiser, Linda Wedlake, Nina Tunariu, Dow-Mu Koh, Christina Messiou, Andrea Rockall, Ben Glocker


Pathology Pathology

A metabolomics pipeline for the mechanistic interrogation of the gut microbiome.

In Nature ; h5-index 368.0

Gut microorganisms modulate host phenotypes and are associated with numerous health effects in humans, ranging from host responses to cancer immunotherapy to metabolic disease and obesity. However, difficulty in accurate and high-throughput functional analysis of human gut microorganisms has hindered efforts to define mechanistic connections between individual microbial strains and host phenotypes. One key way in which the gut microbiome influences host physiology is through the production of small molecules1-3, yet progress in elucidating this chemical interplay has been hindered by limited tools calibrated to detect the products of anaerobic biochemistry in the gut. Here we construct a microbiome-focused, integrated mass-spectrometry pipeline to accelerate the identification of microbiota-dependent metabolites in diverse sample types. We report the metabolic profiles of 178 gut microorganism strains using our library of 833 metabolites. Using this metabolomics resource, we establish deviations in the relationships between phylogeny and metabolism, use machine learning to discover a previously undescribed type of metabolism in Bacteroides, and reveal candidate biochemical pathways using comparative genomics. Microbiota-dependent metabolites can be detected in diverse biological fluids from gnotobiotic and conventionally colonized mice and traced back to the corresponding metabolomic profiles of cultured bacteria. Collectively, our microbiome-focused metabolomics pipeline and interactive metabolomics profile explorer are a powerful tool for characterizing microorganisms and interactions between microorganisms and their host.

Han Shuo, Van Treuren Will, Fischer Curt R, Merrill Bryan D, DeFelice Brian C, Sanchez Juan M, Higginbottom Steven K, Guthrie Leah, Fall Lalla A, Dodd Dylan, Fischbach Michael A, Sonnenburg Justin L


Radiology Radiology

Fair shares: building and benefiting from healthcare AI with mutually beneficial structures and development partnerships.

In British journal of cancer ; h5-index 89.0

Artificial intelligence (AI) algorithms are used in an increasing range of aspects of our lives. In particular, medical applications of AI are being developed and deployed, including many in image analysis. Deep learning methods, which have recently proved successful in image classification, rely on large volumes of clinical data generated by healthcare institutions. Such data is collected from their served populations. In this opinion article, using digital mammographic screening as an example, we briefly consider the background to AI development and some issues around its deployment. We highlight the importance of high quality clinical data as fundamental to these technologies, and question how the ownership of resultant tools should be defined. Though many of the ethical issues concerning the development and use of medical AI technologies continue to be discussed, the value of the data on which they rely remains a subject that is seldom considered. This potentially controversial issue can and should be addressed in a way which is beneficial to all parties, particularly the population in general and the patients we serve.

Sidebottom Richard, Lyburn Iain, Brady Michael, Vinnicombe Sarah