In Bioinformatics (Oxford, England)
MOTIVATION : The rapid accumulation of high-throughput sequence data demands the development of effective and efficient data-driven computational methods to functionally annotate proteins. However, most current approaches used for functional annotation simply focus on the use of protein-level information but ignore inter-relationships among annotations.
RESULTS : Here, we established PFresGO, an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and advances in natural language processing algorithms for the functional annotation of proteins. PFresGO employs a self-attention operation to capture the inter-relationships of GO terms, updates its embedding accordingly, and uses a cross-attention operation to project protein representations and GO embedding into a common latent space to identify global protein sequence patterns and local functional residues. We demonstrate that PFresGO consistently achieves superior performance across GO categories when compared with "state-of-the-art" methods. Importantly, we show that PFresGO can identify functionally important residues in protein sequences by assessing the distribution of attention weightings. PFresGO should serve as an effective tool for the accurate functional annotation of proteins and functional domains within proteins.
AVAILABILITY : PFresGO is available for academic purposes at https://github.com/BioColLab/PFresGO.
SUPPLEMENTARY INFORMATION : Supplementary data are available at Bioinformatics online.
Pan Tong, Li Chen, Bi Yue, Wang Zhikang, Gasser Robin B, Purcell Anthony W, Akutsu Tatsuya, Webb Geoffrey I, Imoto Seiya, Song Jiangning
2023-Feb-16