In Journal of personalized medicine
The early prediction of diabetes can facilitate interventions to prevent or delay it. This study proposes a diabetes prediction model based on machine learning (ML) to encourage individuals at risk of diabetes to employ healthy interventions. A total of 38,379 subjects were included. We trained the model on 80% of the subjects and verified its predictive performance on the remaining 20%. Furthermore, the performances of several algorithms were compared, including logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Cox regression, and XGBoost Survival Embedding (XGBSE). The area under the receiver operating characteristic curve (AUROC) of the XGBoost model was the largest, followed by those of the decision tree, logistic regression, and random forest models. For the survival analysis, XGBSE yielded an AUROC exceeding 0.9 for the 2- to 9-year predictions and a C-index of 0.934, while the Cox regression achieved a C-index of 0.921. After lowering the threshold from 0.5 to 0.25, the sensitivity increased from 0.011 to 0.236 for the 2-year prediction model and from 0.607 to 0.994 for the 9-year prediction model, while the specificity showed negligible changes. We developed a high-performance diabetes prediction model that applied the XGBSE algorithm with threshold adjustment. We plan to use this prediction model in real clinical practice for diabetes prevention after simplifying and validating it externally.
Shin Juyoung, Lee Joonyub, Ko Taehoon, Lee Kanghyuck, Choi Yera, Kim Hun-Sung
2022-Nov-14
XGBoost Survival Embedding, diabetes prediction model, diabetes prevention, type 2 diabetes