In The Science of the total environment
There is a great need for timely monitoring and rapid water quality assessment to control the algal blooms that often occur in eutrophic lakes. While algal cell density (ACD) is a critical indicator of algal growth, field monitoring is laborious and time-consuming, and rapid assessment of algal blooms based on ACD is often not possible. To address the limitations of conventional ACD detection, we proposed a soft sensor approach that uses surrogate indicators to simulate ACD in machine learning models. We conducted a case study using monitoring data from Chaohu Lake collected between 2016 and 2019. We found that ensemble learning models, especially extreme gradient boosting (XGBoost), outperformed traditional machine learning algorithms by comparing various machine learning algorithms. Also, considering the influence of input variable selection on model performance, we combined the results of different filter methods in the multi-stage variable selection process. Finally, we screened out seven key variables out of the 43 initial candidate variables, including dissolved oxygen (DO), chlorophyll-a (Chl-a), Secchi disk depth (SD), pH, permanganate index (CODMn), week of the year (WOY), and wind velocity (WV). Their inclusion substantially improved data accessibility and supported the development of a rapid simulation model. The final model was capable of reliable spatiotemporal generalization, with an overall R2 value of 0.761. On the theoretical side, our study makes a new attempt to simulate ACD values in a eutrophic lake. For practical purposes, the soft sensor can facilitate the rapid assessment of bloom conditions, which helps the local administration with emergency prevention and control.
Rao Wenxin, Qian Xin, Fan Yifan, Liu Tong
2023-Jan-11
Algal blooms, Ensemble learning, Machine learning, Soft sensor, Variable selection, Water quality