Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Molecular biology and evolution ; h5-index 0.0

Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptative IAV nucleotide composition. 217,549 IAV full-length coding sequences of the PB2 (Polymerase basic protein-2), PB1, PA (Polymerase acidic protein), HA (Hemagglutinin), NP (Nucleoprotein), NA (Neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13 and 10 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic (ROC) curve (AUC) indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic datasets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.

Li Jing, Zhang Sen, Li Bo, Hu Yi, Kang Xiao-Ping, Wu Xiao-Yan, Huang Meng-Ting, Li Yu-Chang, Zhao Zhong-Peng, Qin Cheng-Feng, Jiang Tao


Dinucleotide, Genomic nucleotide composition, Human adaptation, Influenza A viruses (IAVs), Machine learning (ML)