In The Science of the total environment
The accurate identification of pollution sources is essential for the prevention and control of possible pollution from soil heavy metals (SHMs). However, the positive matrix factorisation (PMF) model has been widely used as a conventional method for pollution source apportionment, and the classification of source apportionment results mainly relies on existing research and expert experience, which can result in high subjectivity in the source interpretation. To address this limitation, a comprehensive source apportionment framework was developed based on advanced machine learning techniques that combine self-organizing mapping and PMF with a gradient boosting decision tree (GBDT) model. Analysis of Cd, Pb, Zn, Cu, Cr, and Ni in 272 topsoils showed that the average contents of six heavy metals were 1.72-13.79 times greater than corresponding background values, among which Cd pollution was relatively serious, with 66.91 % of the sites having higher values than the specified soil risk screening values. The PMF results revealed that 79.43 % of Pb was related to vehicle emissions and atmospheric deposition, 79.32 % of Cd and 38.84 % of Zn were related to sewage irrigation, and 85.97 % of Cr and 85.50 % of Ni were from natural sources. Moreover, the GBDT detected that industrial network density, water network density, and Fe2O3 content were the major drivers influencing each pollution source. Overall, the novelty of this study lies in the development of an improved framework based on advanced machine learning techniques that led to the accurate identification of the sources of SHM pollution, which can provide more detailed support for environmental protection departments to propose targeted control measures for soil pollution.
Zheng Jiatong, Wang Peng, Shi Hangyuan, Zhuang Changwei, Deng Yirong, Yang Xiaojun, Huang Fei, Xiao Rongbo
2023-Feb-22
Heavy metals, Machine learning, Positive matrix factorization, Soil pollution, Source apportionment