Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In medRxiv : the preprint server for health sciences

MOTIVATION : Precise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes.

RESULTS : We developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes the available sample-level training data and predicts both the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations.

AVAILABILITY : The source code of CSNN and datasets used in the experiments are publicly available on GitHub and FlowRepository.

CONTACT : Edgar E. Robles: roblesee@uci.edu and Yu Qian: mqian@jcvi.org.

SUPPLEMENTARY INFORMATION : Supplementary data are available on GitHub and at Bioinformatics online.

Robles Edgar E, Jin Ye, Smyth Padhraic, Scheuermann Richard H, Bui Jack D, Wang Huan-You, Oak Jean, Qian Yu

2023-Feb-10