A thermodynamic model of protein structure evolution explains empirical amino acid substitution matrices.

amino acid substitution exchangeabilities protein evolution protein stability replacement matrices

Journal

Protein science : a publication of the Protein Society
ISSN: 1469-896X
Titre abrégé: Protein Sci
Pays: United States
ID NLM: 9211750

Informations de publication

Date de publication:
10 2021
Historique:
revised: 25 06 2021
received: 17 03 2021
accepted: 29 06 2021
pubmed: 5 7 2021
medline: 25 12 2021
entrez: 4 7 2021
Statut: ppublish

Résumé

Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi-nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.

Identifiants

pubmed: 34218472
doi: 10.1002/pro.4155
pmc: PMC8442976
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2057-2068

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM096053
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM132499
Pays : United States

Informations de copyright

© 2021 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.

Références

Mol Biol Evol. 1992 Jul;9(4):678-87
pubmed: 1630306
Biophys J. 2010 Dec 15;99(12):3996-4002
pubmed: 21156142
Curr Biol. 2011 Jun 21;21(12):1051-4
pubmed: 21636278
Proc Natl Acad Sci U S A. 1993 May 1;90(9):3904-7
pubmed: 8483909
Mol Biol Evol. 1998 Dec;15(12):1600-11
pubmed: 9866196
PeerJ. 2018 Oct 5;6:e5549
pubmed: 30310736
J Mol Evol. 1994 Sep;39(3):306-14
pubmed: 7932792
PLoS One. 2013;8(4):e59004
pubmed: 23565140
Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9
pubmed: 1438297
Protein Sci. 2012 Jun;21(6):769-85
pubmed: 22528593
J Mol Evol. 1980 Dec;16(2):111-20
pubmed: 7463489
Proc Natl Acad Sci U S A. 2011 Jun 14;108(24):9916-21
pubmed: 21610162
Proc Natl Acad Sci U S A. 2005 May 3;102(18):6395-400
pubmed: 15851683
Bioinformatics. 2011 Oct 1;27(19):2758-60
pubmed: 21791535
Genome Res. 2014 Sep;24(9):1445-54
pubmed: 25079859
Mol Biol Evol. 2006 Jul;23(7):1348-56
pubmed: 16621913
Proc Natl Acad Sci U S A. 2012 Oct 16;109(42):16858-63
pubmed: 23035249
Proc Natl Acad Sci U S A. 2015 Jun 23;112(25):E3226-35
pubmed: 26056312
Methods Enzymol. 2011;487:545-74
pubmed: 21187238
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Proteins. 2011 May;79(5):1396-407
pubmed: 21337623
Mol Biol Evol. 2008 Jul;25(7):1307-20
pubmed: 18367465
Genetics. 1962 Jun;47:713-9
pubmed: 14456043
Philos Trans R Soc Lond B Biol Sci. 2012 Sep 19;367(1602):2584-93
pubmed: 22889909
Bioinformatics. 2001 Aug;17(8):754-5
pubmed: 11524383
Proc Natl Acad Sci U S A. 2012 Oct 9;109(41):E2774-83
pubmed: 22991466
Nat Methods. 2020 Mar;17(3):261-272
pubmed: 32015543
Mol Biol Evol. 2020 Jul 1;37(7):2110-2123
pubmed: 32191313
Mol Biol Evol. 2007 Jul;24(7):1464-79
pubmed: 17400572
J Chem Theory Comput. 2016 Dec 13;12(12):6201-6212
pubmed: 27766851
Mol Biol Evol. 2001 May;18(5):750-6
pubmed: 11319259
Proc Natl Acad Sci U S A. 1968 Jul;60(3):921-2
pubmed: 4970114
Mol Biol Evol. 2004 Jun;21(6):1095-109
pubmed: 15014145
J Theor Biol. 1988 Dec 7;135(3):265-81
pubmed: 3256719
BMC Evol Biol. 2004 Oct 22;4:41
pubmed: 15500694
Nat Ecol Evol. 2018 Aug;2(8):1280-1288
pubmed: 29967485
BMC Evol Biol. 2017 Jun 12;17(1):136
pubmed: 28606055
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D204-6
pubmed: 16381846
Protein Sci. 2021 Oct;30(10):2057-2068
pubmed: 34218472
Mol Biol Evol. 2001 May;18(5):691-9
pubmed: 11319253
Curr Opin Struct Biol. 2014 Jun;26:84-91
pubmed: 24952216
Nature. 1976 Sep 23;263(5575):285-9
pubmed: 958482
Annu Rev Biophys. 2017 May 22;46:85-103
pubmed: 28301766
Mol Biol Evol. 2015 Aug;32(8):2195-207
pubmed: 25837579
Protein Eng. 1996 Jan;9(1):27-36
pubmed: 9053899
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
Proc Natl Acad Sci U S A. 2013 Aug 6;110(32):13067-72
pubmed: 23878237
Curr Opin Struct Biol. 2017 Feb;42:59-66
pubmed: 27865208
Nature. 1976 Sep 23;263(5575):289-93
pubmed: 958483
J R Soc Interface. 2014 Nov 6;11(100):20140419
pubmed: 25165599
Bioinformatics. 2006 Aug 15;22(16):2047-8
pubmed: 16679334
J Mol Evol. 1990 Dec;31(6):511-23
pubmed: 2176699
Proc Natl Acad Sci U S A. 2005 Jul 5;102(27):9541-6
pubmed: 15980155
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
pubmed: 9254694
Proc Natl Acad Sci U S A. 2009 Jun 16;106 Suppl 1:9995-10000
pubmed: 19528653
Genetics. 2004 Mar;166(3):1375-83
pubmed: 15082556
Nat Rev Genet. 2016 Feb;17(2):109-21
pubmed: 26781812
Nat Rev Genet. 2011 Jan;12(1):32-42
pubmed: 21102527
Gene. 2005 Jan 17;345(1):45-53
pubmed: 15716088
Proc Natl Acad Sci U S A. 2012 May 22;109(21):E1352-9
pubmed: 22547823
PLoS Comput Biol. 2006 Jun 23;2(6):e69
pubmed: 16789817
Proteins. 2011 Mar;79(3):830-8
pubmed: 21287615
Mol Biol Evol. 1998 Jul;15(7):910-7
pubmed: 9656490
PLoS One. 2011;6(11):e26400
pubmed: 22069449
Mol Biol Evol. 2002 Mar;19(3):352-6
pubmed: 11861895
Syst Biol. 2010 May;59(3):307-21
pubmed: 20525638
Comput Appl Biosci. 1992 Jun;8(3):275-82
pubmed: 1633570
Mol Biol Evol. 2018 Mar 1;35(3):743-755
pubmed: 29294047

Auteurs

Christoffer Norn (C)

Biochemistry and Structural Biology, Lund University, Lund, Sweden.

Ingemar André (I)

Biochemistry and Structural Biology, Lund University, Lund, Sweden.

Douglas L Theobald (DL)

Biochemistry Department, Brandeis University, Waltham, Massachusetts, USA.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Genome Size Genome, Plant Magnoliopsida Evolution, Molecular Arabidopsis
Genome, Chloroplast Phylogeny Evolution, Molecular Ilex Microsatellite Repeats
Receptor, Cannabinoid, CB1 Ligands Molecular Dynamics Simulation Protein Binding Thermodynamics

Classifications MeSH