In The Science of the total environment
Data-driven model (DDM) prediction of aquatic ecological responses, such as cyanobacterial harmful algal blooms (CyanoHABs), is critically influenced by the choice of training dataset. However, a systematic method to choose the optimal training dataset considering data history has not yet been developed. Providing a comprehensive procedure with self-based optimal training dataset-selecting algorithm would self-improve the DDM performance. In this study, a novel algorithm was developed to self-generate possible training dataset candidates from the available input and output variable data and self-choose the optimal training dataset that maximizes CyanoHAB forecasting performance. Nine years of meteorological and water quality data (input) and CyanoHAB data (output) from a site on the Nakdong River, South Korea, were acquired and pretreated via an automated process. An artificial neural network (ANN) was chosen from among the DDM candidates by first-cut training and validation using the entire collected dataset. Optimal training datasets for the ANN were self-selected from among the possible self-generated training datasets by systematically simulating the performance in response to 46 periods and 40 sizes (number of data elements) of the generated training datasets. The best-performing models were screened to identify the candidate models. The best performance corresponded to 6-7 years of training data (∼18 % lower error) for forecasting 1-28 d ahead (1-28 d of forecasting lead time (FLT)). After the hyperparameters of the screened model candidates were fine-tuned, the best-performing model (7 years of data with 14 d FLT) was self-determined by comparing the forecasts with unseen CyanoHAB events. The self-determined model could reasonably predict CyanoHABs occurring in Korean waters (cyanobacteria cells/mL ≥ 1000). Thus, our proposed method of self-optimizing the training dataset effectively improved the predictive accuracy and operational efficiency of the DDM prediction of CyanoHAB.
Kim Jayun, Jung Woosik, An Jusuk, Oh Hyun Je, Park Joonhong
2023-Jan-05
Algal bloom, Data history, Neural network, Predict, Programming, Training data