In The Science of the total environment
Coronavirus disease, a novel severe acute respiratory syndrome (SARS COVID-19), has become a global health concern due to its unpredictable nature and lack of adequate medicines. Machine Learning (ML) models could be effective in identifying the most critical factors which are responsible for the overall fatalities caused by COVID-19. The functional capabilities of ML models in epidemiological research, especially for COVID-19, are not substantially explored. To bridge this gap, this study has adopted two advanced ML models, viz. Random Forest (RF) and Gradient Boosted Machine (GBM), to perform the regression modelling and provide subsequent interpretation. Five successive steps were followed to carry out the analysis: (1) identification of relevant key explanatory variables; (2) application of data dimensionality reduction for eliminating redundant information; (3) utilizing ML models for measuring relative influence (RI) of the explanatory variables; (4) evaluating interconnections between and among the key explanatory variables and COVID-19 case and death counts; (5) time series analysis for examining the rate of incidences of COVID-19 cases and deaths. Among the explanatory variables considered in this study, air pollution, migration, economy, and demographic factor were found to be the most significant controlling factors. Since a very limited research is available to discuss the superiority of ML models for identifying the key determinants of COVID-19, this study could be a reference for future public health research. Additionally, all the models and data used in this study are open source and freely available, thereby, reproducibility and scientific replication will be achievable easily.
Chakraborti Suman, Maiti Arabinda, Pramanik Suvamoy, Sannigrahi Srikanta, Pilla Francesco, Banerjee Anushna, Das Dipendra Nath
Air pollution, COVID-19, Machine learning, Pandemic, Relative importance, Socioeconomic