This study aimed to develop a diabetes prediction model. The model performance was compared with logistic regression, and the decision tree Chi-square automatic interaction detection (CHAID) was used to predict diabetes. In total, 3233 patients were included in the analysis. Of these, 589 patients with diabetes and 2644 patients without diabetes were included after analyzing the study sample from the Korean Genome and Epidemiology Study (KoGES)-8 data. In this study, Diabetes Mellitus (DM) diagnosis prediction was compared with logistic regression and prediction through machine learning (ML) using the CHAID decision classification tree. We performed statistical analysis using the CHAID method with International Business Machine (IBM) statistical program SPSS®. We performed logistic regression analysis to predict the classification of diabetes accurately, and the total classification accuracy of the analysis was 81.7%, and the CHAID decision tree classification accuracy was 81.8%. A diabetes diagnosis decision tree was created, which included seven terminal nodes and three depth levels. This analysis showed that a blood pressure problem and hospital visits were the most decisive variables at the time of classification, and two risk levels were created for diabetes diagnosis. The suggested method is a valuable tool for predicting diabetes. Patients who visit the hospital because of blood pressure problems are more likely to develop diabetes than under-treating hyperlipidemia. The diabetes prediction model can help doctors make decisions by detecting the possibility of diabetes early; however, it is impossible to diagnose diabetes using only the model without the doctor's opinion.
Choi Hae-Young, Kim Eun-Yeob, Kim Jaeyoung