Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Journal of environmental management

Proper selection of new contaminants to be regulated or monitored prior to implementation is an important issue for regulators and water supply utilities. Herein, we constructed and evaluated machine learning models for predicting the detectability (detection/non-detection) of pesticides in surface water as drinking water sources. Classification and regression models were constructed for Random Forest, XGBoost, and LightGBM, respectively; of these, the LightGBM classification model had the highest prediction accuracy. Furthermore, its prediction performance was superior in all aspects of Recall, Precision, and F-measure compared to the detectability index method, which is based on runoff models from previous studies. Regardless of the type of machine learning model, the number of annual measurements, sales quantity of pesticide for rice-paddy field, and water quality guideline values were the most important model features (explanatory variables). Analysis of the impact of the features suggested the presence of a threshold (or range), above which the detectability increased. In addition, if a feature (e.g., quantity of pesticide sales) acted to increase the likelihood of detection beyond a threshold value, other features also synergistically affected detectability. Proportion of false positives and negatives varied depending on the features used. The superiority of the machine learning models is their ability to represent nonlinear and complex relationships between features and pesticide detectability that cannot be represented by existing risk scoring methods.

Narita Kentaro, Matsui Yoshihiko, Matsushita Taku, Shirasaki Nobutaka

2022-Nov-11

Drinking water source, Guideline, Monitoring, Pesticide concentration, Prediction, Screening