In Water research
The recent outbreaks of harmful algal blooms in the western Lake Erie Basin (WLEB) have drawn tremendous attention to bloom prediction for better control and management. Many weekly to annual bloom prediction models have been reported, but they only employ small datasets, have limited types of input features, build linear regression or probabilistic models, or require complex process-based computations. To address these limitations, we conducted a comprehensive literature review, complied a large dataset containing chlorophyll-a index (from 2002 to 2019) as the output and a novel combination of riverine (the Maumee & Detroit Rivers) and meteorological (WLEB) features as the input, and built machine learning-based classification and regression models for 10-d scale bloom predictions. By analyzing the feature importance, we identified 8 most important features for the HAB control, including nitrogen loads, time, water levels, soluble reactive phosphorus load, and solar irradiance. Here, both long- and short-term nitrogen loads were for the first time considered in HAB models for Lake Erie. Based on these features, the 2-, 3-, and 4-level random forest classification models achieved an accuracy of 89.6%, 77.0%, and 66.7%, respectively, and the regression model achieved an R2 value of 0.69. In addition, long-short term memory (LSTM) was implemented to predict temporal trends of four short-term features (N, solar irradiance, and two water levels) and achieved the Nash-Sutcliffe efficiency of 0.12-0.97. Feeding the LSTM model predictions for these features into the 2-level classification model reached an accuracy of 86.0% for predicting the HABs in 2017-2018, suggesting that we can provide short-term HAB forecasts even when the feature values are not available.
Ai Haiping, Zhang Kai, Sun Jiachun, Zhang Huichun
2023-Feb-05
Bloom forecast, Feature selection, Long-short term memory, Machine learning, Random forest, Time series modeling