Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Using machine-learning algorithms to improve imputation in the Medical Expenditure Panel Survey.

In Health services research

OBJECTIVE : To assess the feasibility of applying machine-learning (ML) methods to imputation in the Medical Expenditure Panel Survey (MEPS).

DATA SOURCES : All data come from the 2016-2017 MEPS.

STUDY DESIGN : Currently, expenditures for medical encounters in the MEPS are imputed with a predictive mean matching (PMM) algorithm in which a linear regression model is used to predict expenditures for events with (donors) and without (recipients) data. Recipient events and donor events are then matched based on the smallest distance between predicted expenditures, and the donor event's expenditures are used as the recipient event's imputation. We replace linear regression algorithm in the PMM framework with ML methods to predict expenditures. We examine five alternatives to linear regression: Gradient Boosting, Random Forests, Extreme Random Forests, Deep Neural Networks, and a Stacked Ensemble approach. Additionally, we introduce an alternative matching scheme which matches on a vector of predicted expenditures by sources of payment instead of a single total expenditure prediction to generate potentially superior matches.

DATA COLLECTION : Study data is derived from a large federal survey.

PRINCIPAL FINDINGS : ML algorithms perform better at both prediction and matching imputation than Ordinary Least Squares (OLS), the most common prediction algorithm used in PMM. On average, the Stacked Ensemble approach that combines all the ML algorithms performs best, improving expenditure prediction R² by 108% (0.156 points) and final imputation R² by 227% (0.397 points). Matching on a prediction vector also improves alignment of sources of payments between donor and recipient events.

CONCLUSIONS : Machine learning algorithms and an alternative matching scheme improve the overall quality of expenditure PMM imputation in the MEPS. These methods may have additional value in other national surveys that currently rely on PMM or similar methods for imputation. This article is protected by copyright. All rights reserved.

McClellan Chandler, Mitchell Emily, Anderson Jerrod, Zuvekas Samuel

2022-Dec-10

Imputation, MEPS, Machine Learning, Medical Expenditures, Predictive Mean Matching

11 Dec 2022

Using machine-learning algorithms to improve imputation in the Medical Expenditure Panel Survey.

Weekly Summary