Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Environmental science and pollution research international

Globally, urban has been the major contributor to greenhouse gas (GHG) emissions and thus plays an increasingly important role in its efforts to reduce CO2 emissions. However, quantifying city-level CO2 emissions is generally a difficult task due to lacking or lower quality of energy-related statistics data, especially for some underdeveloped areas. To address this issue, this study used a set of open access data and machine learning methods to estimate and predict city-level CO2 emissions across China. Two feature selection technologies including Recursive Feature Elimination and Boruta were used to extract the important critical variables and input parameters for modeling CO2 emissions. Finally, 18 out of 31 predictor variables were selected to establish prediction models of CO2 emissions. We found that the statistical indicators of urban environment pollution (such as industrial SO2 and dust emissions per capita) are the most important variables for predicting the city-level CO2 emissions in China. The XGBoost models obtained the highest estimation accuracy with R2 > 0.98 and lower relative error (about 0.8%) than other methods. The CO2 emissions predictive accuracy can be improved modestly by combing geospatial and meteorological interpolation predictor variables (e.g., DEM, annual average precipitation, and air temperature). We also observed an S-shape relationship between urban CO2 emissions per capita and urban economic growth when the rest variables were held constant, rather than a U-shaped one. The findings presented herein provide a first proof of concept that easily available socioeconomic statistical records and geospatial data at urban areas have the potential to accurately predict city-level CO2 emissions with the aid of machine learning algorithms. Our approach can be used to generate carbon footprint maps frequently for the undeveloped regions with scarce detailed energy-related statistical data, to assist policy-makers in designing specific measures of reducing and allocating carbon emissions reduction goal.

Li Ying, Sun Yanwei


City-level CO2 emissions, Geospatial dataset, Machine learning algorithms, Socioeconomic statistical information