In IEEE journal of biomedical and health informatics
Over 34 million people in the US have diabetes, a major cause of blindness, renal failure, and amputations. Machine learning (ML) models can predict high-risk patients to help prevent adverse outcomes. Selecting the 'best' prediction model for a given disease, population, and clinical application is challenging due to the hundreds of health-related ML models in the literature and the increasing availability of ML methodologies. To support this decision process, we developed the Selection of Machine-learning Algorithms with ReplicaTions (SMART) Framework that integrates building and selecting ML models with decision theory. We build ML models and estimate performance for multiple plausible future populations with a replicated nested cross-validation technique. We rank ML models by simulating decision-maker priorities, using a range of accuracy measures (e.g., AUC) and robustness metrics from decision theory (e.g., minimax Regret). We present the SMART Framework through a case study on the microvascular complications of diabetes using data from the ACCORD clinical trial. We compare selections made by risk-averse, -neutral, and -seeking decision-makers, finding agreement in 80% of the risk-averse and risk-neutral selections, with the risk-averse selections showing consistency for a given complication. We also found that the models that best predicted outcomes in the validation set were those with low performance variance on the testing set, indicating a risk-averse approach in model selection is ideal when there is a potential for high population feature variability. The SMART Framework is a powerful, interactive tool that incorporates various ML algorithms and stakeholder preferences, generalizable to new data and technological advancements.
Swan Breanna Patrice, Mayorga Maria E, Ivy Julie