In Computer methods and programs in biomedicine
BACKGROUND AND OBJECTIVE : The new type of Coronavirus (2019-nCov) epidemic spread rapidly, causing more than 250 thousand deaths worldwide. The virus, which first appeared as a sign of pneumonia, was later called the SARS-COV-2 with Severe Acute Respiratory Syndrome by the World Health Organization. The SARS-COV-2 virus is triggered by binding to the Angiotensin-Converting Enzyme 2 (ACE 2) inhibitor, which is vital in cardiovascular diseases and the immune system, especially in conditions such as cerebrovascular, hypertension, and diabetes. This study aims to evaluate the prediction performance of death status based on the demographic/clinical factors (including COVID-19 severity) by data mining methods.
METHODS : The dataset consists of 1603 SARS-COV-2 patients and 13 variables obtained from an open-source web address. The current dataset contains age, gender, chronic disease (hypertension, diabetes, renal, cardiovascular, etc.), some enzymes (ACE, angiotensin II receptor blockers), and COVID-19 severity, which are used to predict death status using deep learning and machine learning approaches (random forest, k-nearest neighbor, extreme gradient boosting [XGBoost]). A grid search algorithm tunes hyperparameters of the models, and predictions are assessed through performance metrics. Steps of knowledge discovery in databases are applied to obtain the relevant information.
RESULTS : The accuracy rate of deep learning (97.15%) was more successful than the accuracy rate based on classical machine learning (92.15% for RF and 93.4% for k-NN), but the ensemble classifier XGBoost method gave the highest accuracy (99.7%). While COVID-19 severity and age calculated from XGBoost were the two most important factors associated with death status, the most determining variables for death status estimated from deep learning were COVID-19 severity and hypertension.
CONCLUSIONS : The proposed model (XGBoost) achieved the best prediction of death status based on the factors as compared to the other algorithms. The results of this study can guide patients with certain variables to take early measures and access preventive health care services before they become infected with the virus.
Kivrak Mehmet, Guldogan Emek, Colak Cemil
Data Mining, Deep Learning, Extreme Gradient Boosting, Machine Learning, SARS-COV-2