Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Health services research

OBJECTIVE : To develop easy to use and validated predictive models to identify beneficiaries experiencing homelessness from administrative data.

DATA SOURCES : We pooled enrollment and claims data from enrollees of the California Whole Person Care (WPC) Medicaid demonstration program that coordinated care of a subset of Medicaid beneficiaries identified as high utilizers in 26 California counties (25 WPC Pilots). We also used public directories of social service and health care facilities.

STUDY DESIGN : Using WPC Pilot-reported homelessness status, we trained seven supervised learning algorithms with different specifications to identify beneficiaries experiencing homelessness. The list of predictors included address- and claims-based indicators, demographics, health status, health care utilization, and county-level homelessness rate. We then assessed model performance using measures of balanced accuracy (BA), sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiver operating characteristic curve (AUC).

DATA COLLECTION/EXTRACTION METHODS : We included 93,656 WPC enrollees from 2017 and 2018, 37,441 of which had a WPC Pilot-reported homelessness indicator.

PRINCIPAL FINDINGS : The random forest algorithm with all available indicators had the best performance (87% BA and 0.95 AUC), but a simpler Generalized Linear Model (GLM) also performed well (74% BA and 0.83 AUC). Reducing predictors to the top 20 and top five most important indicators in a GLM model yields only slightly lower performance (86% BA and 0.94 AUC for top 20 and 86% BA and 0.91 AUC for top five).

CONCLUSIONS : Large samples can be used to accurately predict homelessness in Medicaid administrative data if a validated homelessness indicator for a small subset can be obtained. In the absences of a validated indicator, likelihood of homelessness can be calculated using county rate of homelessness, address- and claim-based indicators, and beneficiary age using a prediction model presented here. These approaches are needed given the rising prevalence of homelessness and focus of Medicaid and other payers on addressing homelessness and its outcomes.

Pourat Nadereh, Yue Dahai, Chen Xiao, Zhou Weihao, O’Masta Brenna

2023-Feb-08

Medicaid, administrative data, homelessness, machine learning algorithms