In Briefings in bioinformatics
In the era of constantly increasing amounts of the available protein data, a relevant and interpretable visualization becomes crucial, especially for tasks requiring human expertise. Poincaré disk projection has previously demonstrated its important efficiency for visualization of biological data such as single-cell RNAseq data. Here, we develop a new method PoincaréMSA for visual representation of complex relationships between protein sequences based on Poincaré maps embedding. We demonstrate its efficiency and potential for visualization of protein family topology as well as evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented in open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.
Susmelj Anna Klimovskaia, Ren Yani, Vander Meersche Yann, Gelly Jean-Christophe, Galochkina Tatiana
data visualization, dimensionality reduction, machine learning, multiple sequence alignment, protein evolution, protein function, protein sequence, sequence similarity