Globally all countries encounter air pollution problems along their development path. As a significant indicator of air quality, PM2.5 concentration has long been proven to be affecting the population's death rate. Machine learning algorithms proven to outperform traditional statistical approaches are widely used in air pollution prediction. However research on the model selection discussion and environmental interpretation of model prediction results is still scarce and urgently needed to lead the policy making on air pollution control. Our research compared four types of machine learning algorisms LinearSVR, K-Nearest Neighbor, Lasso regression, Gradient boosting by looking into their performance in predicting PM2.5 concentrations among different cities and seasons. The results show that the machine learning model is able to forecast the next day PM2.5 concentration based on the previous five days' data with better accuracy. The comparative experiments show that based on city level the Gradient Boosting prediction model has better prediction performance with mean absolute error (MAE) of 9 ug/m3 and root mean square error (RMSE) of 10.25-16.76 ug/m3, lower compared with the other three models, and based on season level four models have the best prediction performances in winter time and the worst in summer time. And more importantly the demonstration of models' different performances in each city and each season is of great significance in environmental policy implications.
Ma Xin, Chen Tengfei, Ge Rubing, Cui Caocao, Xu Fan, Lv Qi
Gradient boosting, Jing-Jin-Ji city group, K-Nearest Neighbor, Lasso regression, Linear SVR, PM2.5 prediction