Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

MaskMitosis: a deep learning framework for fully supervised, weakly supervised, and unsupervised mitosis detection in histopathology images.

In Medical & biological engineering & computing ; h5-index 32.0

Counting the mitotic cells in histopathological cancerous tissue areas is the most relevant indicator of tumor grade in aggressive breast cancer diagnosis. In this paper, we propose a robust and accurate technique for the automatic detection of mitoses from histological breast cancer slides using the multi-task deep learning framework for object detection and instance segmentation Mask RCNN. Our mitosis detection and instance segmentation framework is deployed for two main tasks: it is used as a detection network to perform mitosis localization and classification in the fully annotated mitosis datasets (i.e., the pixel-level annotated datasets), and it is used as a segmentation network to estimate the mitosis mask labels for the weakly annotated mitosis datasets (i.e., the datasets with centroid-pixel labels only). We evaluate our approach on the fully annotated 2012 ICPR grand challenge dataset and the weakly annotated 2014 ICPR MITOS-ATYPIA challenge dataset. Our evaluation experiments show that we can obtain the highest F-score of 0.863 on the 2012 ICPR dataset by applying the mitosis detection and instance segmentation model trained on the pixel-level labels provided by this dataset. For the weakly annotated 2014 ICPR dataset, we first employ the mitosis detection and instance segmentation model trained on the fully annotated 2012 ICPR dataset to segment the centroid-pixel annotated mitosis ground truths, and produce the mitosis mask and bounding box labels. These estimated labels are then used to train another mitosis detection and instance segmentation model for mitosis detection on the 2014 ICPR dataset. By adopting this two-stage framework, our method outperforms all state-of-the-art mitosis detection approaches on the 2014 ICPR dataset by achieving an F-score of 0.475. Moreover, we show that the proposed framework can also perform unsupervised mitosis detection through the estimation of pseudo labels for an unlabeled dataset and it can achieve promising detection results. Code has been made available at: Graphical Abstract Overview of MaskMitosis framework.

Sebai Meriem, Wang Xinggang, Wang Tianjiang


Automatic mitosis detection, Breast cancer histopathological images, Mask RCNN, Mitosis instance segmentation, Multi-task learning

General General

Feasibility of use of medical dual energy scanner for forensic detection and characterization of explosives, a phantom study.

In International journal of legal medicine

OBJECTIVE : Detection of explosives is a challenge due to the use of improvised and concealed bombs. Post-bomb strike bodies are handled by emergency and forensic teams. We aimed to determine whether medical dual-energy computed tomography (DECT) algorithm and prediction model can readily detect and distinguish a range of explosives on the human body during disaster victim identification (DVI) processes of bombings.

MATERIALS AND METHODS : A medical DECT of 8 explosives (Semtex, Pastex, Hexamethylene triperoxide diamine, Acetone peroxide, Nitrocellulose, Pentrite, Ammonium Nitrate, and classified explosive) was conducted ex-vivo and on an anthropomorphic phantom. Hounsfield unit (HU), electron density (ED), effective atomic number (Zeff), and dual energy index (DEI),were compared by Wilcoxon signed rank test. Intra-class (ICC) and Pearson correlation coefficients (r) were computed. Explosives classification was performed through a prediction model with test-retest samples.

RESULTS : Except for DEI (p = 0.036), means of HU, ED, and Zeff were not statistically different (p > 0.05) between explosives ex-vivo and on the phantom (r > 0.80). Intra- and inter-reader ICC were good to excellent: 0.806 to 0.997 and 0.890, respectively. Except for the phantom DEI, all measurements from each individual explosive differed significantly. HU, ED, Zeff, and DEI differed depending on the type of explosive. Our decision tree provided Zeff and ED for explosives classification with high accuracy (83.7%) and excellent reliability (100%).

CONCLUSION : Our medical DECT algorithm and prediction model can readily detect and distinguish our range of explosives on the human body. This would avoid possible endangering of DVI staff.

Ognard Julien, Bourhis David, Cadieu Romain, Grenier Michel, Saccardy Claire, Alavi Zarrin, Ben Salem Douraied


Artificial intelligence, Computer-assisted image processing, Explosives, Forensic medicine, Machine learning

General General

parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.

In GigaScience

BACKGROUND : Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data.

RESULTS : To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version.

CONCLUSIONS : parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at

Petrini Alessandro, Mesiti Marco, Schubach Max, Frasca Marco, Danis Daniel, Re Matteo, Grossi Giuliano, Cappelletti Luca, Castrignanò Tiziana, Robinson Peter N, Valentini Giorgio


GWAS, Mendelian diseases, ensemble methods, high-performance computing, high-performance computing tool for genomic medicine, machine learning for genomic medicine, machine learning for imbalanced genomic data, parallel machine learning tool for big data, parallel machine learning tool for imbalanced data, prediction of deleterious or pathogenic variants

General General

Attentional multi-level representation encoding based on convolutional and variance autoencoders for lncRNA-disease association prediction.

In Briefings in bioinformatics

As the abnormalities of long non-coding RNAs (lncRNAs) are closely related to various human diseases, identifying disease-related lncRNAs is important for understanding the pathogenesis of complex diseases. Most of current data-driven methods for disease-related lncRNA candidate prediction are based on diseases and lncRNAs. Those methods, however, fail to consider the deeply embedded node attributes of lncRNA-disease pairs, which contain multiple relations and representations across lncRNAs, diseases and miRNAs. Moreover, the low-dimensional feature distribution at the pairwise level has not been taken into account. We propose a prediction model, VADLP, to extract, encode and adaptively integrate multi-level representations. Firstly, a triple-layer heterogeneous graph is constructed with weighted inter-layer and intra-layer edges to integrate the similarities and correlations among lncRNAs, diseases and miRNAs. We then define three representations including node attributes, pairwise topology and feature distribution. Node attributes are derived from the graph by an embedding strategy to represent the lncRNA-disease associations, which are inferred via their common lncRNAs, diseases and miRNAs. Pairwise topology is formulated by random walk algorithm and encoded by a convolutional autoencoder to represent the hidden topological structural relations between a pair of lncRNA and disease. The new feature distribution is modeled by a variance autoencoder to reveal the underlying lncRNA-disease relationship. Finally, an attentional representation-level integration module is constructed to adaptively fuse the three representations for lncRNA-disease association prediction. The proposed model is tested over a public dataset with a comprehensive list of evaluations. Our model outperforms six state-of-the-art lncRNA-disease prediction models with statistical significance. The ablation study showed the important contributions of three representations. In particular, the improved recall rates under different top $k$ values demonstrate that our model is powerful in discovering true disease-related lncRNAs in the top-ranked candidates. Case studies of three cancers further proved the capacity of our model to discover potential disease-related lncRNAs.

Sheng Nan, Cui Hui, Zhang Tiangang, Xuan Ping


convolutional and variance autoencoders, deep learning, lncRNA–disease association prediction, representation-level attention

General General

Assessing the Big Five personality traits using real-life static facial images.

In Scientific reports ; h5-index 158.0

There is ample evidence that morphological and social cues in a human face provide signals of human personality and behaviour. Previous studies have discovered associations between the features of artificial composite facial images and attributions of personality traits by human experts. We present new findings demonstrating the statistically significant prediction of a wider set of personality features (all the Big Five personality traits) for both men and women using real-life static facial images. Volunteer participants (N = 12,447) provided their face photographs (31,367 images) and completed a self-report measure of the Big Five traits. We trained a cascade of artificial neural networks (ANNs) on a large labelled dataset to predict self-reported Big Five scores. The highest correlations between observed and predicted personality scores were found for conscientiousness (0.360 for men and 0.335 for women) and the mean effect size was 0.243, exceeding the results obtained in prior studies using 'selfies'. The findings strongly support the possibility of predicting multidimensional personality profiles from static facial images using ANNs trained on large labelled datasets. Future research could investigate the relative contribution of morphological features of the face and other characteristics of facial images to predicting personality.

Kachur Alexander, Osin Evgeny, Davydov Denis, Shutilov Konstantin, Novokshonov Alexey


General General

Linking perturbations to temporal changes in diversity, stability, and compositions of neonatal calf gut microbiota: prediction of diarrhea.

In The ISME journal

Perturbations in early life gut microbiota can have long-term impacts on host health. In this study, we investigated antimicrobial-induced temporal changes in diversity, stability, and compositions of gut microbiota in neonatal veal calves, with the objective of identifying microbial markers that predict diarrhea. A total of 220 samples from 63 calves in first 8 weeks of life were used in this study. The results suggest that increase in diversity and stability of gut microbiota over time was a feature of "healthy" (non-diarrheic) calves during early life. Therapeutic antimicrobials delayed the temporal development of diversity and taxa-function robustness (a measure of microbial stability). In addition, predicted genes associated with beta lactam and cationic antimicrobial peptide resistance were more abundant in gut microbiota of calves treated with therapeutic antimicrobials. Random forest machine learning algorithm revealed that Trueperella, Streptococcus, Dorea, uncultured Lachnospiraceae, Ruminococcus 2, and Erysipelatoclostridium may be key microbial markers that can differentiate "healthy" and "unhealthy" (diarrheic) gut microbiota, as they predicted early life diarrhea with an accuracy of 84.3%. Our findings suggest that diarrhea in veal calves may be predicted by the shift in early life gut microbiota, which may provide an opportunity for early intervention (e.g., prebiotics or probiotics) to improve calf health with reduced usage of antimicrobials.

Ma Tao, Villot Clothilde, Renaud David, Skidmore Andrew, Chevaux Eric, Steele Michael, Guan Le Luo