Phylogenetic correlations can suffice to infer protein partners from sequences.
Journal
PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922
Informations de publication
Date de publication:
10 2019
10 2019
Historique:
received:
10
06
2019
accepted:
25
09
2019
revised:
24
10
2019
pubmed:
15
10
2019
medline:
6
2
2020
entrez:
15
10
2019
Statut:
epublish
Résumé
Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We further demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known direct physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We finally discuss how to distinguish physically interacting proteins from proteins that only share a common evolutionary history.
Identifiants
pubmed: 31609984
doi: 10.1371/journal.pcbi.1007179
pii: PCOMPBIOL-D-19-00955
pmc: PMC6812855
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e1007179Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Cell. 2009 Aug 21;138(4):774-86
pubmed: 19703402
Proc Natl Acad Sci U S A. 2011 Jul 12;108(28):11530-5
pubmed: 21690407
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W315-9
pubmed: 16845017
Mol Syst Biol. 2008;4:165
pubmed: 18277381
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12186-12191
pubmed: 27729520
Bioinformatics. 2005 Jun;21 Suppl 1:i241-50
pubmed: 15961463
Proc Natl Acad Sci U S A. 2006 Dec 12;103(50):19033-8
pubmed: 17138668
Bioinformatics. 2003 Nov 1;19(16):2039-45
pubmed: 14594708
Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301
pubmed: 22106262
Proc Natl Acad Sci U S A. 2014 Apr 8;111(14):5177-82
pubmed: 24706857
BMC Genomics. 2009 Jul 15;10:315
pubmed: 19604365
J Mol Biol. 2000 Jun 2;299(2):283-93
pubmed: 10860738
Protein Eng. 2001 Sep;14(9):609-14
pubmed: 11707606
Genetics. 2013 Oct;195(2):443-55
pubmed: 23934888
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12180-12185
pubmed: 27663738
Proc Natl Acad Sci U S A. 2009 Jan 6;106(1):67-72
pubmed: 19116270
Mol Biol Evol. 2016 Dec;33(12):3054-3064
pubmed: 27604223
BMC Bioinformatics. 2013;14 Suppl 15:S18
pubmed: 24564758
Proc Natl Acad Sci U S A. 2013 Dec 17;110(51):20533-8
pubmed: 24297889
Sci Rep. 2017 Jun 16;7(1):3739
pubmed: 28623316
PLoS Comput Biol. 2014 Aug 07;10(8):e1003776
pubmed: 25102049
PLoS One. 2016 Feb 16;11(2):e0149166
pubmed: 26882169
Mol Biol Evol. 2016 Jan;33(1):268-80
pubmed: 26446903
Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012707
pubmed: 23410359
PLoS One. 2011 May 09;6(5):e19729
pubmed: 21573011
Proc Natl Acad Sci U S A. 2012 Jun 26;109(26):10340-5
pubmed: 22691493
Elife. 2014 Sep 25;3:null
pubmed: 25255213
Rep Prog Phys. 2018 Mar;81(3):032601
pubmed: 29120346
PLoS Comput Biol. 2015 Jun 05;11(6):e1004262
pubmed: 26046683
BMC Bioinformatics. 2008 Jan 23;9:35
pubmed: 18215279
Proc Natl Acad Sci U S A. 2007 May 8;104(19):7999-8004
pubmed: 17468399
Trends Genet. 1996 Sep;12(9):364-9
pubmed: 8855667
PLoS Comput Biol. 2018 Nov 13;14(11):e1006401
pubmed: 30422978
J Bacteriol. 2008 Sep;190(18):6276-9
pubmed: 18567659
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613
pubmed: 30476243
Mol Biol Evol. 2010 Nov;27(11):2567-75
pubmed: 20551042
Sci Rep. 2015 Sep 04;5:13652
pubmed: 26338201
Nat Struct Biol. 1995 Feb;2(2):171-8
pubmed: 7749921
Elife. 2014 May 01;3:e02030
pubmed: 24842992
PLoS Comput Biol. 2013;9(8):e1003176
pubmed: 23990764
Bioinformatics. 2015 Jul 1;31(13):2166-73
pubmed: 25717190
Annu Rev Genet. 2007;41:121-45
pubmed: 18076326
FEBS Lett. 2008 Apr 9;582(8):1225-30
pubmed: 18282476
J Mol Biol. 2009 Jan 9;385(1):91-8
pubmed: 18930732
Bioinformatics. 2010 May 15;26(10):1370-1
pubmed: 20363731
Nucleic Acids Res. 2015 Jan;43(Database issue):D536-41
pubmed: 25324303
Proc Natl Acad Sci U S A. 2014 Feb 4;111(5):E563-71
pubmed: 24449878
PLoS One. 2011;6(12):e28766
pubmed: 22163331
PLoS One. 2014 Mar 24;9(3):e92721
pubmed: 24663061
J Mol Biol. 2003 Mar 14;327(1):273-84
pubmed: 12614624
BMC Evol Biol. 2012 Dec 06;12:238
pubmed: 23217198
Proc Natl Acad Sci U S A. 2018 Jan 23;115(4):690-695
pubmed: 29311320
Phys Rev Lett. 2013 Apr 26;110(17):178102
pubmed: 23679784
Cell. 2016 Jun 2;165(6):1493-1506
pubmed: 27238023
Nat Biotechnol. 2014 Mar;32(3):285-290
pubmed: 24561554