Background : Hepatocellular carcinoma (HCC) among type-2 diabetes (T2D) patients is an increasing burden to diabetes management. This study aims to develop and select the best machine learning (ML) classification model for predicting HCC in T2D for HCC early detection.
Methods : A case-control study was conducted utilising computerised medical records in two hepatobiliary centres. The predictors were chosen using multiple logistic regression. IBM SPSS Modeler® was used to assess the discriminative performance of support vector machine (SVM), logistic regression (LR), artificial neural network (ANN), chi-square automatic interaction detection (CHAID), and their ensembles.
Results : Subjects (N = 424) were split into 60% training (n = 248) and 40% testing (n = 176) groups. The independent predictors identified were race, viral hepatitis, abdominal pain/discomfort, unintentional weight loss, statins, alcohol consumption, non-alcoholic fatty liver, platelet <150 ×103/μL, alkaline phosphatase >129 IU/L, and alanine transaminase ≥25 IU/L. The performances of all models differed significantly (Cochran's Q-test,p = 0.001) but not between the ensembled and SVM model (McNemar test, p = 0.687). SVM model was selected as the best model due to its simplicity, high accuracy (85.28%), and high AUC (0.914). A web-based application was developed using the best model's algorithm for HCC prediction.
Conclusions : If further validation studies confirm these results, the SVM model's application potentially augments early HCC detection in T2D patients.
Azit Noor Atika, Sahran Shahnorbanun, Leow Voon Meng, Subramaniam Manisekar, Mokhtar Suryati, Nawi Azmawati Mohammed
Diabetes, Hepatocellular carcinoma, Machine learning, Risk prediction, Support vector machine