In Health information science and systems
Purpose : Glioblastoma is one of the most common and aggressive brain tumors in the world with a poor prognosis. A glioblastoma prognostication model has the potential to improve the cancer's standard of care. No other paper has looked at using ensemble learning with a population database to predict multiple binary glioblastoma survival outcomes.
Methods : We utilized ensemble learning to design, build, and test a prognostication system for glioblastoma for short-, intermediate- and long-term survival, based on various clinical features. We used the population database SEER which covers 17 different registries. The most important prognostic features were identified and used as a clinical feature set. The statistical feature set was determined using Random Forests. The accuracy, sensitivity, specificity, area under the receiver operating characteristic (AUROC), positive predictive value (PPV), and negative predictive value (NPV) were reported.
Results : Statistically-determined feature sets had the best performance. All the top models for short, intermediate, and long-term survival were random forests. With regards to short-term survival, top model had metrics AUROC = 0.937, accuracy = 86%, specificity = 88%, sensitivity = 85%, NPV = 85%, and PPV = 87%. For long-term survival, the top model had AUROC = 0.893, accuracy = 81%, specificity = 79%, sensitivity = 83%, NPV = 82%, and PPV = 79%. The top intermediate-term survival prediction had AUROC 0.780 and the other metrics were at least 70%.
Conclusions : Our ensemble models were high-performing and achieved AUROCs as high as 0.94, highlighting the importance of balancing, using ensemble techniques and statistical feature selection. Our models can potentially be used by clinicians after external validation.
Samara Kamel A, Al Aghbari Zaher, Abusafia Amani
Ensemble learning, Feature selection, Glioblastoma, Machine learning, Prognosis, SEER, Survival prediction