PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 03 2023
Historique:
received: 21 10 2022
revised: 10 02 2023
accepted: 15 02 2023
pubmed: 17 2 2023
medline: 4 3 2023
entrez: 16 2 2023
Statut: ppublish

Résumé

The rapid accumulation of high-throughput sequence data demands the development of effective and efficient data-driven computational methods to functionally annotate proteins. However, most current approaches used for functional annotation simply focus on the use of protein-level information but ignore inter-relationships among annotations. Here, we established PFresGO, an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and advances in natural language processing algorithms for the functional annotation of proteins. PFresGO employs a self-attention operation to capture the inter-relationships of GO terms, updates its embedding accordingly and uses a cross-attention operation to project protein representations and GO embedding into a common latent space to identify global protein sequence patterns and local functional residues. We demonstrate that PFresGO consistently achieves superior performance across GO categories when compared with 'state-of-the-art' methods. Importantly, we show that PFresGO can identify functionally important residues in protein sequences by assessing the distribution of attention weightings. PFresGO should serve as an effective tool for the accurate functional annotation of proteins and functional domains within proteins. PFresGO is available for academic purposes at https://github.com/BioColLab/PFresGO. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 36794913
pii: 7043095
doi: 10.1093/bioinformatics/btad094
pmc: PMC9978587
pii:
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Major Inter-Disciplinary Research

Informations de copyright

© The Author(s) 2023. Published by Oxford University Press.

Références

Bioinformatics. 2015 Nov 1;31(21):3460-7
pubmed: 26139634
Math Biosci. 2003 Oct;185(2):111-22
pubmed: 12941532
Bioinformatics. 2018 Feb 15;34(4):660-668
pubmed: 29028931
J Mol Biol. 1996 Mar 29;257(2):342-58
pubmed: 8609628
Curr Opin Struct Biol. 2006 Jun;16(3):368-73
pubmed: 16679011
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W6-9
pubmed: 16845079
Genomics. 2012 Jun;99(6):323-9
pubmed: 22546560
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245
pubmed: 35758802
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489
pubmed: 33237286
Nat Rev Mol Cell Biol. 2007 Dec;8(12):995-1005
pubmed: 18037900
Brief Bioinform. 2022 Jul 18;23(4):
pubmed: 35724564
Sci Rep. 2019 May 14;9(1):7344
pubmed: 31089211
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Genome Biol. 2017 May 5;18(1):83
pubmed: 28476144
Nucleic Acids Res. 2008 Jan;36(Database issue):D440-4
pubmed: 17984083
Nucleic Acids Res. 2013 Jan;41(Database issue):D1096-103
pubmed: 23087378
Brief Bioinform. 2022 Mar 10;23(2):
pubmed: 35136916
Nat Commun. 2022 Apr 1;13(1):1728
pubmed: 35365602
Nat Commun. 2021 May 26;12(1):3168
pubmed: 34039967
Bioinformatics. 2022 Sep 30;38(19):4488-4496
pubmed: 35929781
Nat Rev Genet. 2003 Jul;4(7):508-19
pubmed: 12838343
Nucleic Acids Res. 2000 Jan 1;28(1):304-5
pubmed: 10592255
Bioinformatics. 2021 Sep 29;37(18):2825-2833
pubmed: 33755048
Nucleic Acids Res. 2021 Jan 8;49(D1):D545-D551
pubmed: 33125081
Bioinformatics. 2007 Aug 15;23(16):2198-200
pubmed: 17545183
Nucleic Acids Res. 2017 Jan 4;45(D1):D296-D302
pubmed: 27899594

Auteurs

Tong Pan (T)

Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia.

Chen Li (C)

Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia.

Yue Bi (Y)

Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia.

Zhikang Wang (Z)

Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia.

Robin B Gasser (RB)

Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, VIC 3010, Australia.

Anthony W Purcell (AW)

Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia.

Tatsuya Akutsu (T)

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan.

Geoffrey I Webb (GI)

Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia.

Seiya Imoto (S)

Division of Health Medical Intelligence, Human Genome Center, Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo 108-8639, Japan.
Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan.

Jiangning Song (J)

Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia.
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan.
Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic

Classifications MeSH