In Computer methods and programs in biomedicine
BACKGROUND AND OBJECTIVE : Colorectal cancer is a major health concern. It is now the third most common cancer and the fourth leading cause of cancer mortality worldwide. The aim of this study was to evaluate the performance of machine learning algorithms for predicting survival of colorectal cancer patients 1 to 5 years after diagnosis, and identify the most important variables.
METHODS : A sample of 1236 patients diagnosed with colorectal cancer and 118 predictor variables has been used. The outcome of interest was a binary variable indicating whether the patient survived the number of years in question or not. 20 predictor variables were selected using mutual information score with the outcome. We implemented 11 machine learning algorithms and evaluated their performance with a 5 by 2-fold cross-validation with stratified folds and with paired Student's t-tests. We compared the results with the Kaplan-Meier estimator and Cox's proportional hazard regression.
RESULTS : Using the 20 most important predictor variables for each of the survival years, the logistic regression algorithm achieved an area under the receiver operating characteristic curve of 0.850 (0.014 SD, 0.840-0.860 95 % CI) for the 1-year, and 0.872 (0.014 SD, 0.861-0.882 95% CI) for the 5-year survival prediction. Using only the 5 most important predictor variables, the corresponding values are 0.793 (0.020 SD, 0.778-0.807 95% CI) and 0.794 (0.011 SD, 0.785-0.802 95% CI). The most important variables for 1-year prediction were number of R residual, M distant metastasis, overall stage, probable recurrence within 5 years, and tumour length, whereas for 5-year prediction the most important were probable recurrence within 5 years, R residual, M distant metastasis, number of positive lymph nodes, and palliative chemotherapy. Biomarkers do not appear among the top 20 most important ones. For all survival intervals, the probability of the top model agrees with the Kaplan-Meier estimate, both in the interval of one standard deviation and in the 95% confidence interval.
CONCLUSIONS : The findings suggest that machine learning algorithms can predict the survival probability of colorectal cancer patients and can be used to inform the patients and assist decision-making in clinical care management. In addition, this study unveils the most essential variables for estimating survival short- and long-term among patients with Colorectal cancer.
Susič David, Syed-Abdul Shabbir, Dovgan Erik, Jonnagaddala Jitendra, Gradišek Anton
2023-Feb-21
Cancer Survival, Colorectal Cancer, Machine Learning, Survival Prediction