In Environmental science & technology ; h5-index 132.0
Predictive models are useful tools for aqueous adsorption research; existing models such as multilinear regression (MLR), however, can only predict adsorption under specific equilibrium concentrations or for certain adsorption isotherm models. Also, few studies have discussed data processing beyond applying different modeling algorithms in improving the prediction accuracy. In this research, we employed a cosine similarity approach that focused on mining the available data before developing models; this approach can mine the most relevant data concerning the prediction target to build models and was found to considerably improve the prediction accuracy. We then built a machine learning modeling process based on neural networks (NN), a group-selection data-splitting strategy for grouped adsorption data for adsorbent-adsorbate pairs under different equilibrium concentrations, and poly-parameter linear free energy relationships (pp-LFERs) for aqueous adsorption of 165 organic compounds onto 50 biochars, 34 carbon nanotubes, 35 GACs, and 30 polymeric resins. The final NN-LFER models were successfully applied to various equilibrium concentrations regardless of the adsorption isotherm models and showed less prediction deviation than the published models with the root-mean-square errors 0.23-0.31 versus 0.23-0.97 log unit, and the predictions were improved by adding two key descriptors (BET surface area and pore volume) for the adsorbents. Finally, interpreting the NN-LFER models based on the Shapley values suggested that not considering the equilibrium concentration and the properties of the adsorbents in the existing MLR models are the possible reasons for their higher prediction deviations.
Zhang Kai, Zhong Shifa, Zhang Huichun Judy