In Computers in biology and medicine
The prevalence of non-alcoholic fatty liver disease (NAFLD) and NAFLD-associated hepatocellular carcinoma (HCC) has continuously increased in recent years. Machine learning is an effective method for screening the feature genes of a disease for prediction, prevention and personalized treatment. Here, we used the "limma" package and weighted gene co-expression network analysis (WGCNA) to screen 219 NAFLD-related genes and found that they were mainly enriched in inflammation-related pathways. Four feature genes (AXUD1, FOSB, GADD45B, and SOCS2) were screened by LASSO regression and support vector machine-recursive feature elimination (SVM-RFE) machine learning algorithms. Therefore, a clinical diagnostic model with an area under the curve (AUC) value of 0.994 was constructed, which was superior to other indicators of NAFLD. Significant correlations existed between feature genes expression and steatohepatitis histology or clinical variables. These findings were also validated in external datasets and a mouse model. Finally, we found that feature genes expression was significantly decreased in NAFLD-associated HCC and that SOCS2 may be a prognostic biomarker. Our findings may provide new insights into the diagnosis, prevention and treatment targets of NAFLD and NAFLD-associated HCC.
Zhang Zhaohui, Wang Shihao, Zhu Zhengwen, Nie Biao
2023-Mar-05
Bioinformatics analysis, Biomarkers, Machine learning, NAFLD-associated hepatocellular carcinoma, Non-alcoholic fatty liver disease (NAFLD)