In Frontiers in plant science
Crop yield prediction is of great importance for decision making, yet it remains an ongoing scientific challenge. Interactions among different genetic, environmental, and management factors and uncertainty in input values are making crop yield prediction complex. Building upon a previous work in which we coupled crop modeling with machine learning (ML) models to predict maize yields for three US Corn Belt states, here, we expand the concept to the entire US Corn Belt (12 states). More specifically, we built five new ML models and their ensemble models, considering the scenarios with and without crop modeling variables. Additional input values in our models are soil, weather, management, and historical yield data. A unique aspect of our work is the spatial analysis to investigate causes for low or high model prediction errors. Our results indicated that the prediction accuracy increases by coupling crop modeling with machine learning. The ensemble model overperformed the individual ML models, having a relative root mean square error (RRMSE) of about 9% for the test years (2018, 2019, and 2020), which is comparable to previous studies. In addition, analysis of the sources of error revealed that counties and crop reporting districts with low cropland ratios have high RRMSE. Furthermore, we found that soil input data and extreme weather events were responsible for high errors in some regions. The proposed models can be deployed for large-scale prediction at the county level and, contingent upon data availability, can be utilized for field level prediction.
Sajid Saiara Samira, Shahhosseini Mohsen, Huber Isaiah, Hu Guiping, Archontoulis Sotirios V
2022
APSIM, data integration, ensemble model, model transparency, spatial analysis