Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

WeGleNet: A weakly-supervised convolutional neural network for the semantic segmentation of Gleason grades in prostate histology images.

In Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society

BACKGROUND AND OBJECTIVE : Prostate cancer is one of the main diseases affecting men worldwide. The Gleason scoring system is the primary diagnostic tool for prostate cancer. This is obtained via the visual analysis of cancerous patterns in prostate biopsies performed by expert pathologists, and the aggregation of the main Gleason grades in a combined score. Computer-aided diagnosis systems allow to reduce the workload of pathologists and increase the objectivity. Nevertheless, those require a large number of labeled samples, with pixel-level annotations performed by expert pathologists, to be developed. Recently, efforts have been made in the literature to develop algorithms aiming the direct estimation of the global Gleason score at biopsy/core level with global labels. However, these algorithms do not cover the accurate localization of the Gleason patterns into the tissue. These location maps are the basis to provide a reliable computer-aided diagnosis system to the experts to be used in clinical practice by pathologists. In this work, we propose a deep-learning-based system able to detect local cancerous patterns in the prostate tissue using only the global-level Gleason score obtained from clinical records during training.

METHODS : The methodological core of this work is the proposed weakly-supervised-trained convolutional neural network, WeGleNet, based on a multi-class segmentation layer after the feature extraction module, a global-aggregation, and the slicing of the background class for the model loss estimation during training.

RESULTS : Using a public dataset of prostate tissue-micro arrays, we obtained a Cohen's quadratic kappa (κ) of 0.67 for the pixel-level prediction of cancerous patterns in the validation cohort. We compared the model performance for semantic segmentation of Gleason grades with supervised state-of-the-art architectures in the test cohort. We obtained a pixel-level κ of 0.61 and a macro-averaged f1-score of 0.58, at the same level as fully-supervised methods. Regarding the estimation of the core-level Gleason score, we obtained a κ of 0.76 and 0.67 between the model and two different pathologists.

CONCLUSIONS : WeGleNet is capable of performing the semantic segmentation of Gleason grades similarly to fully-supervised methods without requiring pixel-level annotations. Moreover, the model reached a performance at the same level as inter-pathologist agreement for the global Gleason scoring of the cores.

Silva-Rodríguez Julio, Colomer Adrián, Naranjo Valery


Gleason grading, Prostate cancer, Semantic segmentation, Tissue micro-arrays, Weakly supervised

General General

Deep Q-learning for the selection of optimal isocratic scouting runs in liquid chromatography.

In Journal of chromatography. A

An important challenge in chromatography is the development of adequate separation methods. Accurate retention models can significantly simplify and expedite the development of adequate separation methods for complex mixtures. The purpose of this study was to introduce reinforcement learning to chromatographic method development, by training a double deep Q-learning algorithm to select optimal isocratic scouting runs to generate accurate retention models. These scouting runs were fit to the Neue-Kuss retention model, which was then used to predict retention factors both under isocratic and gradient conditions. The quality of these predictions was compared to experimental data points, by computing a mean relative percentage error (MRPE) between the predicted and actual retention factors. By providing the reinforcement learning algorithm with a reward whenever the scouting runs led to accurate retention models and a penalty when the analysis time of a selected scouting run was too high (> 1h); it was hypothesized that the reinforcement learning algorithm should by time learn to select good scouting runs for compounds displaying a variety of characteristics. The reinforcement learning algorithm developed in this work was first trained on simulated data, and then evaluated on experimental data for 57 small molecules - each run at 10 different fractions of organic modifier (0.05 to 0.90) and four different linear gradients. The results showed that the MRPE of these retention models (3.77% for isocratic runs and 1.93% for gradient runs), mostly obtained via 3 isocratic scouting runs for each compound, were comparable in performance to retention models obtained by fitting the Neue-Kuss model to all (10) available isocratic datapoints (3.26% for isocratic runs and 4.97% for gradient runs) and retention models obtained via a "chromatographer's selection" of three scouting runs (3.86% for isocratic runs and 6.66% for gradient runs). It was therefore concluded that the reinforcement learning algorithm learned to select optimal scouting runs for retention modeling, by selecting 3 (out of 10) isocratic scouting runs per compound, that were informative enough to successfully capture the retention behavior of each compound.

Kensert Alexander, Collaerts Gilles, Efthymiadis Kyriakos, Desmet Gert, Cabooter Deirdre


Deep q-learning, Machine learning, Method development, Reinforcement learning, Retention models

General General

Mining influential genes based on deep learning.

In BMC bioinformatics

BACKGROUND : Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome.

RESULTS : Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information.

CONCLUSIONS : We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.

Kong Lingpeng, Chen Yuanyuan, Xu Fengjiao, Xu Mingmin, Li Zutan, Fang Jingya, Zhang Liangyun, Pian Cong


AutoEncoder, Deep learning, DeepLIFT, Landmark genes

General General

Automatic detection of seafloor marine litter using towed camera images and deep learning.

In Marine pollution bulletin

Aerial and underwater imaging is being widely used for monitoring litter objects found at the sea surface, beaches and seafloor. However, litter monitoring requires a considerable amount of human effort, indicating the need for automatic and cost-effective approaches. Here we present an object detection approach that automatically detects seafloor marine litter in a real-world environment using a Region-based Convolution Neural Network. The neural network is trained on an imagery with 11 manually annotated litter categories and then evaluated on an independent part of the dataset, attaining a mean average precision score of 62%. The presence of other background features in the imagery (e.g., algae, seagrass, scattered boulders) resulted to higher number of predicted litter items compare to the observed ones. The results of the study are encouraging and suggest that deep learning has the potential to become a significant tool for automatically recognizing seafloor litter in surveys, accomplishing continuous and precise litter monitoring.

Politikos Dimitris V, Fakiris Elias, Davvetas Athanasios, Klampanos Iraklis A, Papatheodorou George


Aegean Sea, Deep learning, Mask R-CNN, Mediterranean Sea, Object detection, Seafloor marine litter

General General

Use of artificial intelligence to enhance phenotypic drug discovery.

In Drug discovery today ; h5-index 68.0

Research and development (R&D) productivity across the pharmaceutical industry has received close scrutiny over the past two decades, especially taking into consideration reports of attrition rates and the colossal cost for drug development. The respective merits of the two main drug discovery approaches, phenotypic and target based, have divided opinion across the research community, because each hold different advantages for identifying novel molecular entities with a successful path to the market. Nevertheless, both have low translatability in the clinic. Artificial intelligence (AI) and adoption of machine learning (ML) tools offer the promise of revolutionising drug development, and overcoming obstacles in the drug discovery pipeline. Here, we assess the potential of target-driven and phenotypic-based approaches and offer a holistic description of the current state of the field, from both a scientific and industry perspective. With the emerging partnerships between AI/ML and pharma still in their relative infancy, we investigate the potential and current limitations with a particular focus on phenotypic drug discovery. Finally, we emphasise the value of public-private partnerships (PPPs) and cross-disciplinary collaborations to foster innovation and facilitate efficient drug discovery programmes.

Malandraki-Miller Sophia, Riley Paul R


General General

Prediction and Interpretation of Cancer Survival Using Graph Convolution Neural Networks.

In Methods (San Diego, Calif.)

The survival rate of cancer has increased significantly during the past two decades for breast, prostate, testicular, and colon cancer, while the brain and pancreatic cancers have a much lower median survival rate that has not improved much over the last forty years. This has imposed the challenge of finding gene markers for early cancer detection and treatment strategies. Different methods including regression-based Cox-PH, artificial neural networks, and recently deep learning algorithms have been proposed to predict the survival rate for cancers. We established in this work a novel graph convolution neural network (GCNN) approach called Surv_GCNN to predict the survival rate for 13 different cancer types using the TCGA dataset. For each cancer type, 6 Surv_GCNN models with graphs generated by correlation analysis, GeneMania database, and correlation + GeneMania were trained with and without clinical data to predict the prognostic index. The performance of the 6 Surv_GCNN models was compared with two other existing models, Cox-PH and Cox-nnet. The results showed that Cox-PH has the worst performance among 8 tested models across the 13 cancer types while Surv_GCNN models with clinical data reported the best overall performance, outperforming other competing models in 7 out of 13 cancer types including BLCA, BRCA, COAD, LUSC, SARC, STAD, and UCEC. A novel network-based interpretation of Surv_GCNN was also proposed to identify potential gene markers for breast cancer. The signatures learned by the nodes in the hidden layer of Surv_GCNN were identified and were linked to potential gene markers by network modularization. The identified gene markers for breast cancer have been compared to a total of 213 gene markers from three widely cited lists for breast cancer survival analysis. About 57% of gene markers obtained by Surv_GCNN with correlation + GeneMania graph either overlap or directly interact with the 213 genes, confirming the effectiveness of the identified markers by Surv_GCNN.

Ramirez Ricardo, Chiu Yu-Chiao, Zhang SongYao, Ramirez Joshua, Chen Yidong, Huang Yufei, Jin Yu-Fang


Graph Convolutional Neural Network, Survival Analysis, The Cancer Genome Atlas (TCGA)