Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Advances in therapy

INTRODUCTION : Administrative claims data provide an important source for real-world evidence (RWE) generation, but incomplete reporting, such as for body mass index (BMI), limits the sample sizes that can be analyzed to address certain research questions. The objective of this study was to construct models by implementing machine-learning (ML) algorithms to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40 kg/m2) in administrative healthcare claims databases, and then internally and externally validate them.

METHODS : Five advanced ML algorithms were implemented for each BMI classification on a random sampling of BMI readings from the Optum PanTher Electronic Health Record database (2%) and the Optum Clinformatics Date of Death (20%) database, while incorporating baseline demographic and clinical characteristics. Sensitivity analyses with oversampling ratios were conducted. Model performance was validated internally and externally.

RESULTS : Models trained on the Super Learner ML algorithm (SLA) yielded the best BMI classification predictive performance. SLA model 1 utilized sociodemographic and clinical characteristics, including baseline BMI values; the area under the receiver operating characteristic curve (ROC AUC) was approximately 88% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m2 (internal validation), while accuracy ranged from 87.9% to 92.8% and specificity ranged from 91.8% to 94.7%. SLA model 2 utilized sociodemographic information and clinical characteristics, excluding baseline BMI values; ROC AUC was approximately 73% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m2 (internal validation), while accuracy ranged from 73.6% to 80.0% and specificity ranged from 71.6% to 85.9%. The external validation on the MarketScan Commercial Claims and Encounters database yielded relatively consistent results with slightly diminished performance.

CONCLUSION : This study demonstrated the feasibility and validity of using ML algorithms to predict BMI classifications in administrative healthcare claims data to expand the utility for RWE generation.

Wu Bingcao, Chow Wing, Sakthivel Monish, Kakade Onkar, Gupta Kartikeya, Israel Debra, Chen Yen-Wen, Kuruvilla Aarti Susan


Administrative healthcare claims databases, BMI classification, Body mass index, Machine learning, Predictive models, Real-world evidence generation