Improved prediction of site-rates from structure with averaging across homologs.
evolutionary dynamics
evolutionary rates in proteins
protein stability
structure prediction
Journal
Protein science : a publication of the Protein Society
ISSN: 1469-896X
Titre abrégé: Protein Sci
Pays: United States
ID NLM: 9211750
Informations de publication
Date de publication:
Jul 2024
Jul 2024
Historique:
revised:
12
05
2024
received:
27
02
2024
accepted:
04
06
2024
medline:
26
6
2024
pubmed:
26
6
2024
entrez:
26
6
2024
Statut:
ppublish
Résumé
Variation in mutation rates at sites in proteins can largely be understood by the constraint that proteins must fold into stable structures. Models that calculate site-specific rates based on protein structure and a thermodynamic stability model have shown a significant but modest ability to predict empirical site-specific rates calculated from sequence. Models that use detailed atomistic models of protein energetics do not outperform simpler approaches using packing density. We demonstrate that a fundamental reason for this is that empirical site-specific rates are the result of the average effect of many different microenvironments in a phylogeny. By analyzing the results of evolutionary dynamics simulations, we show how averaging site-specific rates across many extant protein structures can lead to correct recovery of site-rate prediction. This result is also demonstrated in natural protein sequences and experimental structures. Using predicted structures, we demonstrate that atomistic models can improve upon contact density metrics in predicting site-specific rates from a structure. The results give fundamental insights into the factors governing the distribution of site-specific rates in protein families.
Substances chimiques
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e5086Informations de copyright
© 2024 The Author(s). Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.
Références
Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016;44:W344–W350.
Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three‐track neural network. Science. 2021;373:871–876.
Drummond DA, Wilke CO. Mistranslation‐induced protein misfolding as a dominant constraint on coding‐sequence evolution. Cell. 2008;134:341–352.
Echave J. Beyond stability constraints: a biophysical model of enzyme evolution with selection on stability and activity. Mol Biol Evol. 2019;36:613–620.
Echave J, Jackson EL, Wilke CO. Relationship between protein thermodynamic constraints and variation of evolutionary rates among sites. Phys Biol. 2015;12:025002.
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet. 2016;17(109):121.
Franzosa EA, Xia Y. Structural determinants of protein evolution are context‐sensitive at the residue level. Mol Biol Evol. 2009;26:2387–2395.
Franzosa EA, Xia Y. Independent effects of protein Core size and expression on residue‐level structure‐evolution relationships. PLoS One. 2012;7:e46602.
Gallagher T, Alexander P, Bryan P, Gilliland GL. Two crystal structures of the B1 immunoglobulin‐binding domain of streptococcal protein G and comparison with NMR. Biochemistry. 1994;33:4721–4729.
Halle B. Flexibility and packing in proteins. Proc Natl Acad Sci USA. 2002;99:1274–1279.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589.
Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution‐based residue‐residue contact predictions in a sequence‐ and structure‐rich era. Proc Natl Acad Sci U S A. 2013;110:15674–15679.
Kimura M. On the probability of fixation of mutant genes in a population. Genetics. 1962;47:713–719.
Le SQ, Gascuel O. An improved general amino acid replacement matrix. Mol Biol Evol. 2008;25:1307–1320.
Leaver‐Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, et al. ROSETTA3: an object‐oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–574.
Lin CP, Huang SW, Lai YL, Yen SC, Shih CH, Lu CH, et al. Deriving protein dynamical properties from weighted protein contact number. Proteins. 2008;72:929–935.
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary‐scale prediction of atomic‐level protein structure with a language model. Science. 2023;379:1123–1130.
Marcos ML, Echave J. Too packed to change: side‐chain packing and site‐specific substitution rates in protein evolution. Peerj. 2015;3:e911.
Mirdita M, von den Driesch L, Galiez C, Martin MJ, Soding J, Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2017;45:D170–D176.
Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49:D412–D419.
Nagar N, Ben Tal N, Pupko T. EvoRator: prediction of residue‐level evolutionary rates from protein structures using machine learning. J Mol Biol. 2022;434:167538.
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ‐TREE: a fast and effective stochastic algorithm for estimating maximum‐likelihood phylogenies. Mol Biol Evol. 2015;32:268–274.
Nisthal A, Wang CY, Ary ML, Mayo SL. Protein stability engineering insights revealed by domain‐wide comprehensive mutagenesis. Proc Natl Acad Sci USA. 2019;116:16367–16377.
Nivon LG, Moretti R, Baker D. A pareto‐optimal refinement method for protein design scaffolds. PLoS One. 2013;8:e59004.
Norn C, Andre I. Atomistic simulation of protein evolution reveals sequence covariation and time‐dependent fluctuations of site‐specific substitution rates. PLoS Comput Biol. 2023;19:e1010262.
Norn C, André I, Theobald DL. A thermodynamic model of protein structure evolution explains empirical amino acid substitution matrices. Protein Sci Publ Protein Soc. 2021;30:2057–2068.
Paz JA, Nartey CM, Yuvaraj M, Morcos F. Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci USA. 2020;117:5873–5882.
Pollock DD, Thiltgen G, Goldstein RA. Amino acid coevolution induces an evolutionary stokes shift. Proc Natl Acad Sci U S A. 2012;109:E1352–E1359.
Pupko T, Bell RE, Mayrose I, Glaser F, Ben‐Tal N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002;18(Suppl 1):S71–S77.
Ramsey DC, Scherrer MP, Zhou T, Wilke CO. The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics. 2011;188:479‐U383.
Scherrer MP, Meyer AG, Wilke CO. Modeling coding‐sequence evolution within the context of residue solvent accessibility. BMC Evol Biol. 2012;12:179.
Shah P, McCandlish DM, Plotkin JB. Contingency and entrenchment in protein evolution under purifying selection. Proc Natl Acad Sci USA. 2015;112:7627.
Shih CH, Chang CM, Lin YS, Lo WC, Hwang JK. Evolutionary information hidden in a single protein structure. Proteins. 2012;80:1647–1657.
Spielman SJ, Kosakovsky Pond SL. Relative evolutionary rate inference in HyPhy with LEISR. Peerj. 2018;6:e4339.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post‐analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313.
Starr TN, Thornton JW. Epistasis in protein evolution. Protein Sci. 2016;25:1204–1218.
Steinegger M, Meier M, Mirdita M, Vohringer H, Haunsberger SJ, Soding J. HH‐suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics. 2019;20:473.
Stone EA, Sidow A. Constructing a meaningful evolutionary average at the phylogenetic center of mass. BMC Bioinformatics. 2007;8:222.
Sydykova DK, Jack BR, Spielman SJ, Wilke CO. Measuring evolutionary rates of proteins in a structural context. F1000Res. 2017;6:1845.
Wang CY, Chang PM, Ary ML, Allen BD, Chica RA, Mayo SL, et al. ProtaBank: a repository for protein design and engineering data. Protein Sci. 2018;27:1113–1124.
Whelan S, Goldman N. A general empirical model of protein evolution derived from multiple protein families using a maximum‐likelihood approach. Mol Biol Evol. 2001;18:691–699.
Williams PD, Pollock DD, Blackburne BP, Goldstein RA. Assessing the accuracy of ancestral protein reconstruction methods. PLoS Comput Biol. 2006;2:e69.
Yang JY, Anishchenko I, Park H, Peng ZL, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci USA. 2020;117:1496–1503.