Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In BMJ open

OBJECTIVES : The purpose of this study was to use easily obtained and directly observable clinical features to establish predictive models to identify patients at increased risk of stroke.

SETTING AND PARTICIPANTS : A total of 46 240 valid records were obtained from 8 research centres and 14 communities in Jiangxi province, China, between February and September 2018.

PRIMARY AND SECONDARY OUTCOME MEASURES : The area under the receiver operating characteristic curve (AUC), sensitivity, specificity and accuracy were calculated to test the performance of the five models (logistic regression (LR), random forest (RF), decision tree (DT), extreme gradient boosting (XGBoost) and gradient boosting DT). The calibration curve was used to show calibration performance.

RESULTS : The results indicated that XGBoost (AUC: 0.924, accuracy: 0.873, sensitivity: 0.776, specificity: 0.916) and RF (AUC: 0.924, accuracy: 0.872, sensitivity: 0.778, specificity: 0.913) demonstrated excellent performance in predicting stroke. Physical inactivity, hypertension, meat-based diet and high salt intake were important prediction features of stroke.

CONCLUSION : The five machine learning models all had good predictive and discriminatory performance for stroke. The performance of RF and XGBoost was slightly better than that of LR, which was easier to interpret and less prone to overfitting. This work provides a rapid and accurate tool for stroke risk assessment, which can help to improve the efficiency of stroke screening medical services and the management of high-risk groups.

Qiu Yuexin, Cheng Shiqi, Wu Yuhang, Yan Wei, Hu Songbo, Chen Yiying, Xu Yan, Chen Xiaona, Yang Junsai, Chen Xiaoyun, Zheng Huilie

2023-Mar-01

epidemiology, statistics & research methods, stroke