In Journal of animal science
With agriculture rapidly becoming a data driven field it is imperative to extract useful information from large data collections to optimize the production systems. We compared the efficacy of regression (linear regression or generalized linear regression for continuous or categorical outcomes, respectively), random forests (RF) and multilayer neural networks (NN) to predict beef carcass weight (CW), age when finished (AS), fat deposition (FD), and carcass quality (CQ). The data analyzed contained information on over 4 million beef cattle from 5,204 farms, corresponding to 4.3% of Brazil's national production between 2014-2016. Explanatory variables were integrated from different data sources and encompassed animal traits, participation in a technical advising program, nutritional products sold to farms, economic variables related to beef production, month when finished, soil fertility, and climate in the location in which animals were raised. The training set was composed of information collected in 2014 and 2015, while the testing set had information recorded in 2016. After parameter tuning for each algorithm, models were used to predict the testing set. The best model to predict CW and AS was RF (CW: predicted Root Mean Square Error (RMSEp) = 0.65, R2 = 0.61 and Mean Absolute Error (MAE) = 0.49; AS: Accuracy = 28.7%, Cohen's kappa coefficient (Kappa) = 0.08). While the best approach for FD and CQ was generalized linear regression (Accuracy = 45.7%, Kappa = 0.05, and Accuracy = 58.7%, Kappa = 0.09, respectively). Across all models there was a tendency for better performance with RF and regression and worse with NN. Animal category, nutritional plan, cattle sales price, participation in a technical advising program and climate and soil in which animals were raised were deemed important for prediction of meat production and quality with regression and RF. The development of strategies for prediction of livestock production using real-world large scale data will be core to projecting future trends and optimizing the allocation of resources at all levels of the production chain, rendering animal production more sustainable. Despite beef cattle production being a complex system, this analysis shows that by integrating different sources of data it is possible to forecast meat production and quality at the national level with moderate-high levels of accuracy.
Aiken Vera Cardoso Ferreira, Fernandes Arthur Francisco Araújo, Passafaro Tiago Luciano, Acedo Juliano Sabella, Dias Fábio Guerra, Dórea João Ricardo Rebouças, Rosa Guilherme Jordão de Magalhães
Brazil, beef, forecasting, large scale data, machine learning