In Clinical kidney journal
Background : Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients.
Methods : Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session.
Results : There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68-0.73] and superior to logistic regression models (ΔAUC 0.007-0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables.
Conclusions : Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.
Garcia-Montemayor Victoria, Martin-Malo Alejandro, Barbieri Carlo, Bellocchio Francesco, Soriano Sagrario, Pendon-Ruiz de Mier Victoria, Molina Ignacio R, Aljama Pedro, Rodriguez Mariano
haemodialysis, machine learning, mortality, predictive models, random forest