Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Public Health Public Health

Occurrence, predictors and hazards of elevated groundwater arsenic across India through field observations and regional-scale AI-based modeling.

In The Science of the total environment

Existence of wide spread elevated concentrations of groundwater arsenic (As) across South Asia, including India, has endangered a huge groundwater-based drinking water dependent population. Here, using high-spatial resolution As field-observations (~3 million groundwater sources) across India, we have delineated the regional-scale occurrence of elevated groundwater As (≥10 μg/L), along with the possible geologic-geomorphologic-hydrologic and human-sourced predictors that influence the spatial distribution of the contaminant. Using statistical and machine learning method, we also modeled the groundwater As concentrations probability at 1 Km resolution, along with probabilistic delineation of high As-hazard zones across India. The observed occurrence of groundwater As was found to be most strongly influenced by geology-tectonics, groundwater-fed irrigated area (%) and elevation. Pervasive As contamination is observed in major parts of the Himalayan mega-river Indus-Ganges-Brahmaputra basins, however it also occurs in several more-localized pockets, mostly related to ancient tectonic zones, igneous provinces, aquifers in modern delta and chalcophile mineralized regions. The model results suggest As-hazard potential in yet-undetected areas. Our model performed well in predicting groundwater arsenic, with accuracy: 82% and 84%; area under the curve (AUC): 0.89 and 0.88 for test data and validation datasets. An estimated ~90 million people across India are found to be exposed to high groundwater As from field-observed data, with the five states with highest hazard are West Bengal (28 million), Bihar (21 million), Uttar Pradesh (15 million), Assam (8.6 million) and Punjab (6 million). However it can be much more if the modeled hazard is considered (>250 million). Thus, our study provides a detailed, quantitative assessment of high groundwater As across India, with delineation of possible intrinsic influences and exogenous forcings. The predictive model is helpful in predicting As-hazard zones in the areas with limited measurements.

Mukherjee Abhijit, Sarkar Soumyajit, Chakraborty Madhumita, Duttagupta Srimanti, Bhattacharya Animesh, Saha Dipankar, Bhattacharya Prosun, Mitra Adway, Gupta Saibal


Arsenic, Groundwater contamination, India, Machine learning, Public health, Tectonics

Cardiology Cardiology

Neural collaborative filtering for unsupervised mitral valve segmentation in echocardiography.

In Artificial intelligence in medicine ; h5-index 34.0

The segmentation of the mitral valve annulus and leaflets specifies a crucial first step to establish a machine learning pipeline that can support physicians in performing multiple tasks, e.g. diagnosis of mitral valve diseases, surgical planning, and intraoperative procedures. Current methods for mitral valve segmentation on 2D echocardiography videos require extensive interaction with annotators and perform poorly on low-quality and noisy videos. We propose an automated and unsupervised method for the mitral valve segmentation based on a low dimensional embedding of the echocardiography videos using neural network collaborative filtering. The method is evaluated in a collection of echocardiography videos of patients with a variety of mitral valve diseases, and additionally on an independent test cohort. It outperforms state-of-the-art unsupervised and supervised methods on low-quality videos or in the case of sparse annotation.

Corinzia Luca, Laumer Fabian, Candreva Alessandro, Taramasso Maurizio, Maisano Francesco, Buhmann Joachim M


Collaborative filtering, Mitral valve, Neural network, Segmentation

Surgery Surgery

Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network.

In Artificial intelligence in medicine ; h5-index 34.0

Distant recurrence of breast cancer results in high lifetime risks and low 5-year survival rates. Early prediction of distant recurrent breast cancer could facilitate intervention and improve patients' life quality. In this study, we designed an EHR-based predictive model to estimate the distant recurrent probability of breast cancer patients. We studied the pathology reports and progress notes of 6,447 patients who were diagnosed with breast cancer at Northwestern Memorial Hospital between 2001 and 2015. Clinical notes were mapped to Concept unified identifiers (CUI) using natural language processing tools. Bag-of-words and pre-trained embedding were employed to vectorize words and CUI sequences. These features integrated with clinical features from structured data were downstreamed to conventional machine learning classifiers and Knowledge-guided Convolutional Neural Network (K-CNN). The best configuration of our model yielded an AUC of 0.888 and an F1-score of 0.5. Our work provides an automated method to predict breast cancer distant recurrence using natural language processing and deep learning approaches. We expect that through advanced feature engineering, better predictive performance could be achieved.

Wang Hanyin, Li Yikuan, Khan Seema A, Luo Yuan


Breast cancer, Distant recurrence, Entity embeddings, Knowledge-guided convolutional neural network, Word embeddings

General General

Autoencoded DNA methylation data to predict breast cancer recurrence: Machine learning models and gene-weight significance.

In Artificial intelligence in medicine ; h5-index 34.0

Breast cancer is the most frequent cancer in women and the second most frequent overall after lung cancer. Although the 5-year survival rate of breast cancer is relatively high, recurrence is also common which often involves metastasis with its consequent threat for patients. DNA methylation-derived databases have become an interesting primary source for supervised knowledge extraction regarding breast cancer. Unfortunately, the study of DNA methylation involves the processing of hundreds of thousands of features for every patient. DNA methylation is featured by High Dimension Low Sample Size which has shown well-known issues regarding feature selection and generation. Autoencoders (AEs) appear as a specific technique for conducting nonlinear feature fusion. Our main objective in this work is to design a procedure to summarize DNA methylation by taking advantage of AEs. Our proposal is able to generate new features from the values of CpG sites of patients with and without recurrence. Then, a limited set of relevant genes to characterize breast cancer recurrence is proposed by the application of survival analysis and a pondered ranking of genes according to the distribution of their CpG sites. To test our proposal we have selected a dataset from The Cancer Genome Atlas data portal and an AE with a single-hidden layer. The literature and enrichment analysis (based on genomic context and functional annotation) conducted regarding the genes obtained with our experiment confirmed that all of these genes were related to breast cancer recurrence.

Macías-García Laura, Martínez-Ballesteros María, Luna-Romera José María, García-Heredia José M, García-Gutiérrez Jorge, Riquelme-Santos José C


Autoencoder, Breast cancer, DNA methylation, Feature generation, Machine learning

General General

Decoding working memory task condition using MEG source level long-range phase coupling patterns.

In Journal of neural engineering ; h5-index 52.0

OBJECTIVE : The objective of the study is to identify phase coupling patterns that are shared across subjects via a machine learning approach that utilises source space MEG phase coupling data from a Working Memory (WM) task. Indeed, phase coupling of neural oscillations is putatively a key factor for communication between distant brain areas and it is therefore crucial in performing cognitive tasks, including WM. Previous studies investigating phase coupling during cognitive tasks have often focused on a few a priori selected brain areas or a specific frequency band and the need for data-driven approaches has been recognised. Machine learning techniques have emerged as valuable tools for the analysis of neuroimaging data since they catch fine-grained differences in the multivariate signal distribution. Here, we expect that these techniques applied to MEG phase couplings can reveal WM related processes that are shared across individuals.

APPROACH : We analysed WM data collected as part of the Human Connectome Project. The MEG data were collected while subjects (N=83) performed N-back WM tasks in two different conditions, namely 2-back (WM condition) and 0-back (control condition). We estimated phase coupling patterns (Multivariate Phase Slope Index) for both conditions and for theta, alpha, beta, and gamma bands. The obtained phase coupling data were then used to train a linear support vector machine in order to classify which task condition the subject was performing with an across-subject cross-validation approach. The classification was performed separately based on the data from individual frequency bands and with all bands combined (multiband). Finally, we evaluated the relative importance of the different features (phase couplings) for the classification by the means of feature selection probability.

MAIN RESULTS : The WM condition and control condition were successfully classified based on the phase coupling patterns in theta (62 % accuracy) and alpha bands (60 % accuracy) separately. Importantly, the multiband classification showed that not only phase coupling patterns in theta and alpha but also in the gamma bands are related to WM processing as testified by improvement in classification performance (71 %).

SIGNIFICANCE : Our study successfully decoded working memory tasks using MEG source space functional connectivity. Our approach, combining across-subject classification and a multidimensional metric recently developed by our group, is able to detect patterns of connectivity that are shared across individuals. In other words the results are generalisable to new individuals and allow meaningful interpretation of the task relevant phase coupling patterns.

Syrjälä Jaakko Johannes, Basti Alessio, Guidotti Roberto, Marzetti Laura, Pizzella Vittorio


machine learning, magnetoencephalography, neural oscillations, phase coupling, working memory

Pathology Pathology

The impact of pre- and post-image processing techniques on deep learning frameworks: A comprehensive review for digital pathology image analysis.

In Computers in biology and medicine

Recently, deep learning frameworks have rapidly become the main methodology for analyzing medical images. Due to their powerful learning ability and advantages in dealing with complex patterns, deep learning algorithms are ideal for image analysis challenges, particularly in the field of digital pathology. The variety of image analysis tasks in the context of deep learning includes classification (e.g., healthy vs. cancerous tissue), detection (e.g., lymphocytes and mitosis counting), and segmentation (e.g., nuclei and glands segmentation). The majority of recent machine learning methods in digital pathology have a pre- and/or post-processing stage which is integrated with a deep neural network. These stages, based on traditional image processing methods, are employed to make the subsequent classification, detection, or segmentation problem easier to solve. Several studies have shown how the integration of pre- and post-processing methods within a deep learning pipeline can further increase the model's performance when compared to the network by itself. The aim of this review is to provide an overview on the types of methods that are used within deep learning frameworks either to optimally prepare the input (pre-processing) or to improve the results of the network output (post-processing), focusing on digital pathology image analysis. Many of the techniques presented here, especially the post-processing methods, are not limited to digital pathology but can be extended to almost any image analysis field.

Salvi Massimo, Acharya U Rajendra, Molinari Filippo, Meiburger Kristen M


Deep learning, Digital pathology, Histology, Image analysis, Post-processing, Pre-processing