PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 03 2023
01 03 2023
Historique:
received:
21
10
2022
revised:
10
02
2023
accepted:
15
02
2023
pubmed:
17
2
2023
medline:
4
3
2023
entrez:
16
2
2023
Statut:
ppublish
Résumé
The rapid accumulation of high-throughput sequence data demands the development of effective and efficient data-driven computational methods to functionally annotate proteins. However, most current approaches used for functional annotation simply focus on the use of protein-level information but ignore inter-relationships among annotations. Here, we established PFresGO, an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and advances in natural language processing algorithms for the functional annotation of proteins. PFresGO employs a self-attention operation to capture the inter-relationships of GO terms, updates its embedding accordingly and uses a cross-attention operation to project protein representations and GO embedding into a common latent space to identify global protein sequence patterns and local functional residues. We demonstrate that PFresGO consistently achieves superior performance across GO categories when compared with 'state-of-the-art' methods. Importantly, we show that PFresGO can identify functionally important residues in protein sequences by assessing the distribution of attention weightings. PFresGO should serve as an effective tool for the accurate functional annotation of proteins and functional domains within proteins. PFresGO is available for academic purposes at https://github.com/BioColLab/PFresGO. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 36794913
pii: 7043095
doi: 10.1093/bioinformatics/btad094
pmc: PMC9978587
pii:
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Major Inter-Disciplinary Research
Informations de copyright
© The Author(s) 2023. Published by Oxford University Press.
Références
Bioinformatics. 2015 Nov 1;31(21):3460-7
pubmed: 26139634
Math Biosci. 2003 Oct;185(2):111-22
pubmed: 12941532
Bioinformatics. 2018 Feb 15;34(4):660-668
pubmed: 29028931
J Mol Biol. 1996 Mar 29;257(2):342-58
pubmed: 8609628
Curr Opin Struct Biol. 2006 Jun;16(3):368-73
pubmed: 16679011
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W6-9
pubmed: 16845079
Genomics. 2012 Jun;99(6):323-9
pubmed: 22546560
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245
pubmed: 35758802
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489
pubmed: 33237286
Nat Rev Mol Cell Biol. 2007 Dec;8(12):995-1005
pubmed: 18037900
Brief Bioinform. 2022 Jul 18;23(4):
pubmed: 35724564
Sci Rep. 2019 May 14;9(1):7344
pubmed: 31089211
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Genome Biol. 2017 May 5;18(1):83
pubmed: 28476144
Nucleic Acids Res. 2008 Jan;36(Database issue):D440-4
pubmed: 17984083
Nucleic Acids Res. 2013 Jan;41(Database issue):D1096-103
pubmed: 23087378
Brief Bioinform. 2022 Mar 10;23(2):
pubmed: 35136916
Nat Commun. 2022 Apr 1;13(1):1728
pubmed: 35365602
Nat Commun. 2021 May 26;12(1):3168
pubmed: 34039967
Bioinformatics. 2022 Sep 30;38(19):4488-4496
pubmed: 35929781
Nat Rev Genet. 2003 Jul;4(7):508-19
pubmed: 12838343
Nucleic Acids Res. 2000 Jan 1;28(1):304-5
pubmed: 10592255
Bioinformatics. 2021 Sep 29;37(18):2825-2833
pubmed: 33755048
Nucleic Acids Res. 2021 Jan 8;49(D1):D545-D551
pubmed: 33125081
Bioinformatics. 2007 Aug 15;23(16):2198-200
pubmed: 17545183
Nucleic Acids Res. 2017 Jan 4;45(D1):D296-D302
pubmed: 27899594