Origin of the COVID-19 virus has been intensely debated in the scientific community since the first infected cases were detected in December 2019. The disease has caused a global pandemic, leading to deaths of thousands of people across the world and thus finding origin of this novel coronavirus is important in responding and controlling the pandemic. Recent research results suggest that bats or pangolins might be the original hosts for the virus based on comparative studies using its genomic sequences. This paper investigates the COVID-19 virus origin by using artificial intelligence (AI) and raw genomic sequences of the virus. More than 300 genome sequences of COVID-19 infected cases collected from different countries are explored and analysed using unsupervised clustering methods. The results obtained from various AI-enabled experiments using clustering algorithms demonstrate that all examined COVID-19 virus genomes belong to a cluster that also contains bat and pangolin coronavirus genomes. This provides evidences strongly supporting scientific hypotheses that bats and pangolins are probable hosts for the COVID-19 virus. At the whole genome analysis level, our findings also indicate that bats are more likely the hosts for the COVID-19 virus than pangolins.
Nguyen, T. T.; Abdelrazek, M.; Nguyen, D. T.; Aryal, S.; Nguyen, D. T.; Khatami, A.