In Journal of medical Internet research ; h5-index 88.0
BACKGROUND : In the clinical care of well-established diseases, randomized trials, literature and research are supplemented by clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, Artificial Intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, lack of clinical data restricts the design and development of such AI tools, particularly in preparation of an impending crisis or pandemic.
OBJECTIVE : This study aimed to develop and test the feasibility of a 'patients-like-me' framework to predict COVID-19 patient deterioration using a retrospective cohort of similar respiratory diseases.
METHODS : Our framework used COVID-like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) from an academic medical center, 2008-2019. Fifteen training cohorts were created using different combinations of the COVID-like cohorts with the ARDS cohort for exploratory purpose. Two machine learning (ML) models were developed, one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features.
RESULTS : Compared to the COVID-like cohorts (n=16,509), the COVID-19 hospitalized patients (n=159) were significantly younger, with a higher proportion of Hispanic ethnicity, lower proportion of smoking history and fewer comorbidities (P <0.001). COVID-19 patients had a lower IMV rate (15.1 vs 23.2, P=0.016) and shorter time to IMV (2.9 vs 4.1, P <0.001) compared to the COVID-like patients. In the COVID-like training data, the top models achieved excellent performance (AUV > 0.90). Validating in the COVID-19 cohort, the best performing model of predicting IMV was the XGBoost model (AUC: 0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all four COVID-like cohorts without ARDS achieved the best performance (AUC: 0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood count, cardiac troponin, albumin, etc.). Our models suffered from class imbalance, that resulted in high negative predictive values and low positive predictive values.
CONCLUSIONS : We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic.
Sang Shengtian, Sun Ran, Coquet Jean, Carmichael Haris, Seto Tina, Hernandez-Boussard Tina