In Animal genetics
Breed identification utilizing multiple information sources and methods is widely applicated in the field of animal genetics and breeding. Simultaneously, with the development of artificial intelligence, the integration of high-throughput genomic data and machine learning techniques is increasingly used for breed identification. In this context, we used 654 individuals from 15 pig breeds, evaluating the performance of machine learning and stacking ensemble learning classifiers, as well as the function of feature selection and anomaly detection in different scenarios. Our results showed that, when using a training set of 16 individuals per breed and 32 features (SNPs), the accuracy of breed identification with feature selection (eXtreme Gradient Boosting, XGBoost) could exceed 95.00% (nine breeds), and was improved by 7.04% over the results with random selection. For stacking ensemble learning, feature selection methods (including random selection method) were used before different base learners. When these base learners' training set had 16 individuals per breed and 32 features, the accuracy of stacking ensemble learning improved by 9.24% over the best base learner (nine breeds), but did not significantly increase the advantage over the models with XGBoost feature selection. When using a training set of 16 individuals and 512 features per breed, breed identification with anomaly detection (local outlier factor, LOF) and random selection could achieve an accuracy of 89.06% (15 breeds). These results show that machine learning could be an effective tool for breed identification and this study will also provide useful information for the application of machine learning in animal genetics and breeding.
Liu Ruiqi, Xu Zhiting, Teng Jinyan, Pan Xiangchun, Lin Qing, Cai Xiaodian, Diao Shuqi, Feng Xueyan, Yuan Xiaolong, Li Jiaqi, Zhang Zhe
2022-Dec-02
anomaly detection, breed identification, feature selection, machine learning, stacking ensemble