In Environmental science & technology ; h5-index 132.0
NO2 is a combustion byproduct that has been associated with multiple adverse health outcomes. To assess NO2 level with high accuracy, we propose an ensemble model to integrate multiple machine learning algorithms, including neural network, random forest, and gradient boosting, with a variety of predictor variables, including chemical transport models. This NO2 model covers the entire contiguous U.S. with daily predictions on 1-km-level grid cells from 2000 to 2016. The ensemble produced a cross-validated R2 of 0.788 overall, a spatial R2 of 0.844, and a temporal R2 of 0.729. The relationship between daily monitored and predicted NO2 is almost linear. We also estimated the associated monthly uncertainty level for the predictions and address-specific NO2 levels. This NO2 estimation has a very high spatiotemporal resolution and allows the examination of health effects of NO2 in unmonitored areas. We found the highest NO2 levels along highways and in cities. We also observed that nationwide NO2 levels declined in early years and stagnated after 2007, in contrast to the trend at monitoring sites in urban areas, where the decline continued. Our research indicates that integrating different predictor variables and fitting algorithms can achieve an improved air pollution modeling framework.
Di Qian, Amini Heresh, Shi Liuhua, Kloog Itai, Silvern Rachel Faye, Kelly James T, Sabath M Benjamin, Choirat Christine, Koutrakis Petros, Lyapustin Alexei, Wang Yujie, Mickley Loretta J, Schwartz Joel