In Journal of clinical epidemiology ; h5-index 60.0
OBJECTIVE : Clinical risk prediction models are generally assessed on population level with a lack of measures that evaluate their stability at predicting risks of individual patients. This study evaluated the use of ranking as a measure to assess individual level stability between risk prediction models.
STUDY DESIGN AND SETTING : A large patient cohort (3.66 million patients with 0.11 million cardiovascular events) extracted from the Clinical Practice Research Datalink was used in the exemplar of cardiovascular disease risk prediction.
RESULTS : It was found that 15 models (including machine learning and statistical models) had similar population-level model performance (C statistics about 0.88). For patients with high absolute risks, the models were more consistent in ranking of risk predictions (interquartile range (IQR) of differences in rank percentiles -0.6 to 1.0), but inconsistent in absolute risk (IQR of differences in absolute risk -18.8 to 9.0). At low risk, the reverse was true with inconsistent ranking but more consistent absolute risk.
CONCLUSION : Consistency of ranking of individual risk predictions is a useful measure to assess risk prediction models providing complementary information to absolute risk stability. Model developing guidelines including "TRIPOD" and "PROBAST" should incorporate ranking to assess individual level stability between risk prediction models.
Li Yan, Sperrin Matthew, Ashcroft Darren M, van Staa Tjeerd Pieter
Cardiovascular disease, Machine learning, PROBAST, QRISK3, Ranking, Risk prediction model, TRIPOD