Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Environmental pollution (Barking, Essex : 1987)

Pollutants in the soil of industrial site are often highly heterogeneously distributed, which brought a challenge to accurately predict their three-dimensional (3D) spatial distributions. Here we attempt to create effective 3D prediction models using machine learning (ML) and readily attainable multisource auxiliary data for improving the prediction accuracy of highly heterogeneous Zn in the soil of a small-size industrial site. Using raw covariates from functional area layout, stratigraphic succession, and electrical resistivity tomography, and derived covariates of the raw covariates as predictors, we created 6 individual and 2 ensemble models for Zn, based on ML algorithms such as k-nearest neighbors, random forest, and extreme gradient boosting, and the stacking approach in ensemble ML. Results showed that the overall 3D spatial patterns of Zn predicted by individual and ensemble ML models, inverse distance weighting (IDW), and ordinary Kriging (OK) were similar, but their predictive performances differed significantly. The ensemble model with raw and derived covariates had the highest accuracy in representing the complex 3D spatial patterns of Zn (R2 = 0.45, RMSE = 344.80 mg kg-1), compared to the accuracies of individual ML models (R2 = 0.27-0.44, RMSE = 396.75-348.56 mg kg-1), OK (R2 = 0.33, RMSE = 381.12 mg kg-1), and IDW interpolation (R2 = 0.25, RMSE = 402.94 mg kg-1). Besides, the prediction accuracy gains of incorporating derived covariates were higher than adopting ensemble ML instead of single ML algorithm. These results highlighted the importance of developing derived covariates whilst adopting ML in predicting the 3D distribution of highly heterogeneous pollutant in the soil of small-size industrial site.

Peng Yuxuan, Chen Jian, Xie Enze, Zhang Xiu, Yan Guojing, Zhao Yongcun

2022-Dec-21

Covariate development, Industrial site, Machine learning, Multisource auxiliary data, Three-dimensional prediction