Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions.
Deep learning
Graph attention network
Protein–protein interaction
Structure-based prediction
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
10 Sep 2022
10 Sep 2022
Historique:
received:
09
06
2022
accepted:
26
08
2022
entrez:
10
9
2022
pubmed:
11
9
2022
medline:
14
9
2022
Statut:
epublish
Résumé
Development of new methods for analysis of protein-protein interactions (PPIs) at molecular and nanometer scales gives insights into intracellular signaling pathways and will improve understanding of protein functions, as well as other nanoscale structures of biological and abiological origins. Recent advances in computational tools, particularly the ones involving modern deep learning algorithms, have been shown to complement experimental approaches for describing and rationalizing PPIs. However, most of the existing works on PPI predictions use protein-sequence information, and thus have difficulties in accounting for the three-dimensional organization of the protein chains. In this study, we address this problem and describe a PPI analysis based on a graph attention network, named Struct2Graph, for identifying PPIs directly from the structural data of folded protein globules. Our method is capable of predicting the PPI with an accuracy of 98.89% on the balanced set consisting of an equal number of positive and negative pairs. On the unbalanced set with the ratio of 1:10 between positive and negative pairs, Struct2Graph achieves a fivefold cross validation average accuracy of 99.42%. Moreover, Struct2Graph can potentially identify residues that likely contribute to the formation of the protein-protein complex. The identification of important residues is tested for two different interaction types: (a) Proteins with multiple ligands competing for the same binding area, (b) Dynamic protein-protein adhesion interaction. Struct2Graph identifies interacting residues with 30% sensitivity, 89% specificity, and 87% accuracy. In this manuscript, we address the problem of prediction of PPIs using a first of its kind, 3D-structure-based graph attention network (code available at https://github.com/baranwa2/Struct2Graph ). Furthermore, the novel mutual attention mechanism provides insights into likely interaction sites through its unsupervised knowledge selection process. This study demonstrates that a relatively low-dimensional feature embedding learned from graph structures of individual proteins outperforms other modern machine learning classifiers based on global protein features. In addition, through the analysis of single amino acid variations, the attention mechanism shows preference for disease-causing residue variations over benign polymorphisms, demonstrating that it is not limited to interface residues.
Sections du résumé
BACKGROUND
BACKGROUND
Development of new methods for analysis of protein-protein interactions (PPIs) at molecular and nanometer scales gives insights into intracellular signaling pathways and will improve understanding of protein functions, as well as other nanoscale structures of biological and abiological origins. Recent advances in computational tools, particularly the ones involving modern deep learning algorithms, have been shown to complement experimental approaches for describing and rationalizing PPIs. However, most of the existing works on PPI predictions use protein-sequence information, and thus have difficulties in accounting for the three-dimensional organization of the protein chains.
RESULTS
RESULTS
In this study, we address this problem and describe a PPI analysis based on a graph attention network, named Struct2Graph, for identifying PPIs directly from the structural data of folded protein globules. Our method is capable of predicting the PPI with an accuracy of 98.89% on the balanced set consisting of an equal number of positive and negative pairs. On the unbalanced set with the ratio of 1:10 between positive and negative pairs, Struct2Graph achieves a fivefold cross validation average accuracy of 99.42%. Moreover, Struct2Graph can potentially identify residues that likely contribute to the formation of the protein-protein complex. The identification of important residues is tested for two different interaction types: (a) Proteins with multiple ligands competing for the same binding area, (b) Dynamic protein-protein adhesion interaction. Struct2Graph identifies interacting residues with 30% sensitivity, 89% specificity, and 87% accuracy.
CONCLUSIONS
CONCLUSIONS
In this manuscript, we address the problem of prediction of PPIs using a first of its kind, 3D-structure-based graph attention network (code available at https://github.com/baranwa2/Struct2Graph ). Furthermore, the novel mutual attention mechanism provides insights into likely interaction sites through its unsupervised knowledge selection process. This study demonstrates that a relatively low-dimensional feature embedding learned from graph structures of individual proteins outperforms other modern machine learning classifiers based on global protein features. In addition, through the analysis of single amino acid variations, the attention mechanism shows preference for disease-causing residue variations over benign polymorphisms, demonstrating that it is not limited to interface residues.
Identifiants
pubmed: 36088285
doi: 10.1186/s12859-022-04910-9
pii: 10.1186/s12859-022-04910-9
pmc: PMC9464414
doi:
Substances chimiques
Amino Acids
0
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
370Subventions
Organisme : Defense Advanced Research Projects Agency
ID : HR00111720067
Organisme : Office of Naval Research
ID : N000141812876
Organisme : Army Research Office
ID : W911NF-19-1-0269
Informations de copyright
© 2022. The Author(s).
Références
Science. 2017 Feb 24;355(6327):831-833
pubmed: 28232575
Brief Bioinform. 2021 Sep 2;22(5):
pubmed: 33693513
Adv Drug Deliv Rev. 2004 Sep 22;56(11):1675-87
pubmed: 15350296
Protein Eng. 2001 Sep;14(9):609-14
pubmed: 11707606
ACS Cent Sci. 2016 Jul 27;2(7):434-7
pubmed: 27504489
Curr Top Med Chem. 2013;13(5):602-18
pubmed: 23548023
Nature. 2002 Jan 10;415(6868):180-3
pubmed: 11805837
J Mol Biol. 2015 Aug 28;427(17):2886-98
pubmed: 26173036
BMC Bioinformatics. 2020 Jul 21;21(1):323
pubmed: 32693790
Nanoscale. 2018 Mar 8;10(10):4927-4939
pubmed: 29480295
Proteomics. 2007 Aug;7(16):2833-42
pubmed: 17640003
J Exp Med. 1999 Mar 15;189(6):907-18
pubmed: 10075974
Nature. 2022 Jan;601(7893):366-373
pubmed: 35046606
Bioinformatics. 2020 Feb 15;36(4):1241-1251
pubmed: 31584634
Macromolecules. 2013 Dec 10;46(23):9169-9180
pubmed: 28804160
J Nanobiotechnology. 2018 Sep 19;16(1):71
pubmed: 30231877
Structure. 2011 Jul 13;19(7):955-66
pubmed: 21742262
Bioinformatics. 2011 Oct 15;27(20):2820-7
pubmed: 21873637
Science. 2010 Oct 8;330(6001):188-9
pubmed: 20929766
ACS Nano. 2017 Mar 28;11(3):2313-2381
pubmed: 28290206
Nucleic Acids Res. 2014 Jan;42(Database issue):D358-63
pubmed: 24234451
Neural Netw. 2005 Oct;18(8):1093-110
pubmed: 16157471
Proteomics. 2019 Jun;19(12):e1900019
pubmed: 30941889
ACS Nano. 2015 Sep 22;9(9):9097-105
pubmed: 26325486
Nucleic Acids Res. 2019 Jan 8;47(D1):D464-D474
pubmed: 30357411
Science. 2021 Dec 10;374(6573):eabm4805
pubmed: 34762488
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W185-9
pubmed: 18442990
Biochem Biophys Res Commun. 2007 Feb 23;353(4):985-91
pubmed: 17207465
Nature. 2000 Feb 10;403(6770):623-7
pubmed: 10688190
Hum Genomics. 2009 Apr;3(3):291-7
pubmed: 19403463
Science. 1999 Jul 30;285(5428):751-3
pubmed: 10427000
PLoS Comput Biol. 2008 Oct;4(10):e1000173
pubmed: 18974822
Science. 2020 May 8;368(6491):642-648
pubmed: 32273399
Nature. 2002 Jan 10;415(6868):141-7
pubmed: 11805826
Bioinformatics. 2005 Jun;21 Suppl 1:i47-56
pubmed: 15961493
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):531
pubmed: 31787089
J Chem Inf Model. 2017 Jun 26;57(6):1499-1510
pubmed: 28514151
Bioinformatics. 2019 Jan 15;35(2):309-318
pubmed: 29982330
Biochem Biophys Res Commun. 2019 Apr 23;512(1):100-105
pubmed: 30871775
Mol Cell Proteomics. 2010 Aug;9(8):1650-65
pubmed: 20445003
Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3692-E3701
pubmed: 29610332
Chem Soc Rev. 2012 Apr 7;41(7):2824-48
pubmed: 22158998
Curr Opin Cell Biol. 1994 Oct;6(5):752-8
pubmed: 7833055
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613
pubmed: 30476243
Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4337-41
pubmed: 17360525
Methods. 2012 Dec;58(4):343-8
pubmed: 22884951
Mol Biotechnol. 2008 Jan;38(1):1-17
pubmed: 18095187
Microbiol Rev. 1995 Mar;59(1):94-123
pubmed: 7708014
Adv Drug Deliv Rev. 2018 May;130:50-57
pubmed: 29958925
Biopolymers. 2006;84(6):535-52
pubmed: 17009316
Nature. 2012 Oct 25;490(7421):556-60
pubmed: 23023127
Protein Pept Lett. 2014;21(8):766-78
pubmed: 23855673
PeerJ. 2019 Jun 17;7:e7126
pubmed: 31245182
Front Immunol. 2018 Apr 25;9:862
pubmed: 29922279
Bioinformatics. 2020 Apr 15;36(8):2547-2553
pubmed: 31879763
Sci Rep. 2020 Dec 3;10(1):21092
pubmed: 33273494
Curr Opin Struct Biol. 2014 Feb;24:10-23
pubmed: 24721449
Cell. 2003 Oct 17;115(2):217-28
pubmed: 14567919
Sci Rep. 2018 Aug 3;8(1):11694
pubmed: 30076341
Clin Cancer Res. 2012 Jun 15;18(12):3229-41
pubmed: 22669131
Nat Rev Cancer. 2006 Sep;6(9):688-701
pubmed: 16900224
Bioinformatics. 2018 Sep 1;34(17):i802-i810
pubmed: 30423091
Brief Bioinform. 2021 Jan 18;22(1):194-218
pubmed: 31867611
Bioinformatics. 2018 Jan 15;34(2):223-229
pubmed: 28968673
Curr Med Chem. 2015;22(26):3014-24
pubmed: 26242256
Proc Natl Acad Sci U S A. 2001 Apr 10;98(8):4569-74
pubmed: 11283351
Bioinformatics. 2014 Feb 01;30(3):335-42
pubmed: 24281696
Proteomics. 2012 May;12(10):1478-98
pubmed: 22711592
Mol Med Rep. 2017 Sep;16(3):2714-2720
pubmed: 28713916
Nucleic Acids Res. 2008 May;36(9):3025-30
pubmed: 18390576
Int J Mol Sci. 2013 Feb 21;14(2):4242-82
pubmed: 23429269
J Mol Biol. 2003 Apr 11;327(5):919-23
pubmed: 12662919