Background : The existing dementia risk models are limited to known risk factors and traditional statistical methods. We aimed to employ machine learning (ML) to develop a novel dementia prediction model by leveraging a rich-phenotypic variable space of 366 features covering multiple domains of health-related data.
Methods : In this longitudinal population-based cohort of the UK Biobank (UKB), 425,159 non-demented participants were enrolled from 22 recruitment centres across the UK between March 1, 2006 and October 31, 2010. We implemented a data-driven strategy to identify predictors from 366 candidate variables covering a comprehensive range of genetic and environmental factors and developed the ML model to predict incident dementia and Alzheimer's Disease (AD) within five, ten, and much longer years (median 11.9 [Interquartile range 11.2-12.5] years).
Findings : During a follow-up of 5,023,337 person-years, 5287 and 2416 participants developed dementia and AD, respectively. A novel UKB dementia risk prediction (UKB-DRP) model comprising ten predictors including age, ApoE ε4, pairs matching time, leg fat percentage, number of medications taken, reaction time, peak expiratory flow, mother's age at death, long-standing illness, and mean corpuscular volume was established. Our prediction model was internally evaluated based on five-fold cross-validation on discrimination and calibration, and it was further compared with existing prediction scales. The UKB-DRP model can achieve high discriminative accuracy in dementia (AUC 0.848 ± 0.007) and even better in AD (AUC 0.862 ± 0.015). The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.92), and the predictive power was solid in different incidence time groups. More importantly, our model presented an apparent superiority over existing models like Cardiovascular Risk Factors, Aging, and Incidence of Dementia Risk Score (AUC 0.705 ± 0.008), the Dementia Risk Score (AUC 0.752 ± 0.007), and the Australian National University Alzheimer's Disease Risk Index (AUC 0.584 ± 0.017). The model was internally validated in the general population of European ancestry and White ethnicity; thus, further validation with independent datasets is necessary to confirm these findings.
Interpretation : Our ML-based UKB-DRP model incorporated ten easily accessible predictors with solid predictive power for incident dementia and AD within five, ten, and much longer years, which can be used to identify individuals at high risk of dementia and AD in the general population.
Funding : This study was funded by grants from the Science and Technology Innovation 2030 Major Projects (2022ZD0211600), National Key R&D Program of China (2018YFC1312904, 2019YFA070950), National Natural Science Foundation of China (282071201, 81971032, 82071997), Shanghai Municipal Science and Technology Major Project (2018SHZDZX01), Research Start-up Fund of Huashan Hospital (2022QD002), Excellence 2025 Talent Cultivation Program at Fudan University (3030277001), Shanghai Rising-Star Program (21QA1408700), Medical Engineering Fund of Fudan University (yg2021-013), and the 111 Project (No. B18015).
You Jia, Zhang Ya-Ru, Wang Hui-Fu, Yang Ming, Feng Jian-Feng, Yu Jin-Tai, Cheng Wei
AD, Alzheimer’s disease, ANU-ADRI, Australian National University Alzheimer’s disease risk index, AUC, area under the receiver operating characteristic curve, “Alzheimers disease”, CAIDE, cardiovascular risk factors, aging, and incidence of dementia risk score, DRS, dementia risk score, Dementia, ML, machine learning, Machine learning, Prediction model, UK biobank, UKB-DRP, UK biobank dementia risk prediction