In JMIR medical informatics ; h5-index 23.0
BACKGROUND : Machine learning (ML) models require large datasets which may be siloed across different healthcare institutions. Current ML studies focusing on coronavirus disease 2019 (COVID-19) are limited to single hospital data which limits model generalizability.
OBJECTIVE : Using federated learning, a ML technique that avoids locally aggregating raw clinical data across multiple institutions, we predict mortality within seven days in hospitalized COVID-19 patients.
METHODS : Patient data was collected from Electronic Health Records (EHRs) from five hospitals within the Mount Sinai Health System (MSHS). Logistic Regression with L1 regularization (LASSO) and Multilayer Perceptron (MLP) models were trained using local data at each site, a pooled model with combined data from all five sites, and a federated model that only shared parameters with a central aggregator.
RESULTS : LASSO-federated outperformed LASSO-local at three hospitals, and MLP-federated performed better than MLP-local at all five hospitals as measured by area under the receiver-operating characteristic (AUC-ROC). LASSO-pooled outperformed LASSO-federated at all hospitals, and MLP-federated outperformed MLP-pooled at two hospitals.
CONCLUSIONS : Federated learning shows promise in COVID-19 EHR data to develop robust predictive models without compromising patient privacy.
Vaid Akhil, Jaladanki Suraj K, Xu Jie, Teng Shelly, Kumar Arvind, Lee Samuel, Somani Sulaiman, Paranjpe Ishan, De Freitas Jessica K, Wanyan Tingyi, Johnson Kipp W, Bicak Mesude, Klang Eyal, Kwon Young Joon, Costa Anthony, Zhao Shan, Miotto Riccardo, Charney Alexander W, Böttinger Erwin, Fayad Zahi A, Nadkarni Girish N, Wang Fei, Glicksberg Benjamin S