In International journal of cardiology ; h5-index 68.0
BACKGROUND : This study aimed to develop a machine learning (ML) model to identify patients who are likely to have pulmonary hypertension (PH), using a large patient-level US-based electronic health record (EHR) database.
METHODS : A gradient boosting model, XGBoost, was developed using data from Optum's US-based de-identified EHR dataset (2007-2019). PH and disease control adult patients were identified using diagnostic, treatment and procedure codes and were randomly split into the training (90%) or test set (10%). Model features included patient demographics, physician visits, diagnoses, procedures, prescriptions, and laboratory test results. Shapley Additive exPlanations values were used to determine feature importance.
RESULTS : We identified 11,279,478 control and 115,822 PH patients (mean age, respectively: 62 and 68 years, both 53% female). The final model used 165 features, with the most important predictive features including diagnosis of heart failure, shortness of breath and atrial fibrillation. The model predicted PH with an area under the receiver operating characteristic curve (AUROC) of 0.92. AUROC remained above 0.80 for the prediction of PH up to and beyond 18 months before diagnosis. Among the PH patients, we also identified 955 pulmonary arterial hypertension (PAH) and 1432 chronic thromboembolic pulmonary hypertension (CTEPH) patients, and the range of AUROCs obtained for these cohorts was 0.79-0.90 and 0.87-0.96, respectively.
CONCLUSIONS : This model to detect PH based on patients' EHR records is viable and performs well in subgroups of PAH and CTEPH patients. This approach has the potential to improve patient outcomes by reducing diagnostic delay in PH.
Kogan Emily, Didden Eva-Maria, Lee Eileen, Nnewihe Anderson, Stamatiadis Dimitri, Mataraso Samson, Quinn Deborah, Rosenberg Daniel, Chehoud Christel, Bridges Charles
2022-Dec-14
Artificial intelligence, Diagnostic delay, Early diagnosis, Electronic health record, Machine learning, Pulmonary hypertension