In Journal of biomedical informatics ; h5-index 55.0
The recent outbreak of COVID-19 has infected millions of people around the world, which is leading to the global emergency. In the event of the virus outbreak, it is crucial to get the carriers of the virus timely and precisely, then the animal origins can be isolated for further infection. Traditional identifications rely on fields and laboratory researches that lag the responses to emerging epidemic prevention. With the development of machine learning, the efficiency of predicting the viral hosts has been demonstrated by recent researchers. However, the problems of the limited annotated virus data and imbalanced hosts information restrict these approaches to obtain a better result. To assure the high reliability of predicting the animal origins on COVID-19, we extend transfer learning and ensemble learning to present a hybrid transfer learning model. When predicting the hosts of newly discovered virus, our model provides a novel solution to utilize the related virus domain as auxiliary to help building a robust model for target virus domain. The simulation results on several UCI benchmarks and viral genome datasets demonstrate that our model outperforms the general classical methods under the condition of limited target training sets and class-imbalance problems. By setting the coronavirus as target domain and other related virus as source domain, the feasibility of our approach is evaluated. Finally, we show the animal reservoirs prediction of the COVID-19 for further analysing.
Yang Yun, Guo Jing, Wang Pei, Wang Yaowei, Yu Minghao, Wang Xiang, Yang Po, Sun Liang
COVID-19, ensemble learning, hosts prediction, machine learning, transfer learning, virus origins