In Briefings in bioinformatics
Robust strategies to identify patients at high risk for tumor metastasis, such as those frequently observed in intrahepatic cholangiocarcinoma (ICC), remain limited. While gene/protein expression profiling holds great potential as an approach to cancer diagnosis and prognosis, previously developed protocols using multiple diagnostic signatures for expression-based metastasis prediction have not been widely applied successfully because batch effects and different data types greatly decreased the predictive performance of gene/protein expression profile-based signatures in interlaboratory and data type dependent validation. To address this problem and assist in more precise diagnosis, we performed a genome-wide integrative proteome and transcriptome analysis and developed an ensemble machine learning-based integration algorithm for metastasis prediction (EMLI-Metastasis) and risk stratification (EMLI-Prognosis) in ICC. Based on massive proteome (216) and transcriptome (244) data sets, 132 feature (biomarker) genes were selected and used to train the EMLI-Metastasis algorithm. To accurately detect the metastasis of ICC patients, we developed a weighted ensemble machine learning method based on k-Top Scoring Pairs (k-TSP) method. This approach generates a metastasis classifier for each bootstrap aggregating training data set. Ten binary expression rank-based classifiers were generated for detection of metastasis separately. To further improve the accuracy of the method, the 10 binary metastasis classifiers were combined by weighted voting based on the score from the prediction results of each classifier. The prediction accuracy of the EMLI-Metastasis algorithm achieved 97.1% and 85.0% in proteome and transcriptome datasets, respectively. Among the 132 feature genes, 21 gene-pair signatures were developed to establish a metastasis-related prognosis risk-stratification model in ICC (EMLI-Prognosis). Based on EMLI-Prognosis algorithm, patients in the high-risk group had significantly dismal overall survival relative to the low-risk group in the clinical cohort (P-value < 0.05). Taken together, the EMLI-ICC algorithm provides a powerful and robust means for accurate metastasis prediction and risk stratification across proteome and transcriptome data types that is superior to currently used clinicopathological features in patients with ICC. Our developed algorithm could have profound implications not just in improved clinical care in cancer metastasis risk prediction, but also more broadly in machine-learning-based multi-cohort diagnosis method development. To make the EMLI-ICC algorithm easily accessible for clinical application, we established a web-based server for metastasis risk prediction (http://ibi.zju.edu.cn/EMLI/).
Ruan Jian, Xu Shuaishuai, Chen Ruyin, Qu Wenxin, Li Qiong, Ye Chanqi, Wu Wei, Jiang Qi, Yan Feifei, Shen Enhui, Chu Qinjie, Jia Yunlu, Zhang Xiaochen, Fu Wenguang, Chen Jinzhang, Timko Michael P, Zhao Peng, Fan Longjiang, Shen Yifei
algorithm, intrahepatic cholangiocarcinoma (ICC), machine learning, metastasis, risk stratification