In Blood cancer discovery
Gene expression classifiers are gaining increasing popularity for stratifying tumors into subgroups with distinct biological features. A fundamental limitation shared by current classifiers is the requirement for comparable training and testing data sets. Here, we describe a self-training implementation of our probability ratio-based classification prediction score method (PRPS-ST), which facilitates the porting of existing classification models to other gene expression data sets. In comparison to gold standards, we demonstrate favorable performance of PRPS-ST in gene expression-based classification of DLBCL and B-ALL using a diverse variety of gene expression data types and pre-processing methods, including in classifications with a high degree of class imbalance. Tumors classified by our method were significantly enriched for prototypical genetic features of their respective subgroups. Interestingly, this included cases that were unclassifiable by established methods, implying the potential enhanced sensitivity of PRPS-ST.
Jiang Aixiang, Hilton Laura K, Tang Jeffrey, Rushton Christopher K, Grande Bruno M, Scott David W, Morin Ryan D
B-ALL, binary classifier, hematologic, machine learning, molecular subgroup