In Scientific reports ; h5-index 158.0
Nonalcoholic fatty liver disease (NAFLD) is one of major causes of end-stage liver disease in the coming decades, but it shows few symptoms until it develops into cirrhosis. We aim to develop classification models with machine learning to screen NAFLD patients among general adults. This study included 14,439 adults who took health examination. We developed classification models to classify subjects with or without NAFLD using decision tree, random forest (RF), extreme gradient boosting (XGBoost) and support vector machine (SVM). The classifier with SVM was showed the best performance with the highest accuracy (0.801), positive predictive value (PPV) (0.795), F1 score (0.795), Kappa score (0.508) and area under the precision-recall curve (AUPRC) (0.712), and the second top of area under receiver operating characteristic curve (AUROC) (0.850). The second-best classifier was RF model, which was showed the highest AUROC (0.852) and the second top of accuracy (0.789), PPV (0.782), F1 score (0.782), Kappa score (0.478) and AUPRC (0.708). In conclusion, the classifier with SVM is the best one to screen NAFLD in general population based on the results from physical examination and blood testing, followed by the classifier with RF. Those classifiers have a potential to screen NAFLD in general population for physician and primary care doctors, which could benefit to NAFLD patients from early diagnosis.
Qin Shenghua, Hou Xiaomin, Wen Yuan, Wang Chunqing, Tan Xiaxian, Tian Hao, Ao Qingqing, Li Jingze, Chu Shuyuan
2023-Mar-03