In Global epidemiology
In the first half of 2020, much excitement in news media and some peer reviewed scientific articles was generated by the discovery that fine particulate matter (PM2.5) concentrations and COVID-19 mortality rates are statistically significantly positively associated in some regression models. This article points out that they are non-significantly negatively associated in other regression models, once omitted confounders (such as latitude and longitude) are included. More importantly, positive regression coefficients can and do arise when (generalized) linear regression models are applied to data with strong nonlinearities, including data on PM2.5, population density, and COVID-19 mortality rates, due to model specification errors. In general, statistical modeling accompanied by judgments about causal interpretations of statistical associations and regression coefficients - the current weight-of-evidence (WoE) approach favored in much current regulatory risk analysis for air pollutants - is not a valid basis for determining whether or to what extent risk of harm to human health would be reduced by reducing exposure. The traditional scientific method based on testing predictive generalizations against data remains a more reliable paradigm for risk analysis and risk management.
Cox Louis Anthony, Popken Douglas A
Air pollution, Bayesian networks, CART trees, COVID-19 mortality risk, Causation, Health effects, Machine learning, Model specification error, PM2.5, Random forest, Regression, Scientific method