Mega-scale experimental analysis of protein folding stability in biology and design.
Journal
Nature
ISSN: 1476-4687
Titre abrégé: Nature
Pays: England
ID NLM: 0410462
Informations de publication
Date de publication:
Aug 2023
Aug 2023
Historique:
received:
05
01
2023
accepted:
14
06
2023
medline:
11
8
2023
pubmed:
20
7
2023
entrez:
19
7
2023
Statut:
ppublish
Résumé
Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale
Identifiants
pubmed: 37468638
doi: 10.1038/s41586-023-06328-6
pii: 10.1038/s41586-023-06328-6
pmc: PMC10412457
doi:
Substances chimiques
Amino Acids
0
DNA, Complementary
0
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
434-444Informations de copyright
© 2023. The Author(s).
Références
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
pubmed: 34265844
pmcid: 8371605
doi: 10.1038/s41586-021-03819-2
Dill, K. A. Dominant forces in protein folding. Biochemistry 29, 7133–7155 (1990).
pubmed: 2207096
doi: 10.1021/bi00483a001
Stein, A., Fowler, D. M., Hartmann-Petersen, R. & Lindorff-Larsen, K. Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem. Sci. 44, 575–588 (2019).
pubmed: 30712981
pmcid: 6579676
doi: 10.1016/j.tibs.2019.01.003
Yue, P., Li, Z. & Moult, J. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353, 459–473 (2005).
pubmed: 16169011
doi: 10.1016/j.jmb.2005.08.020
Agozzino, L. & Dill, K. A. Protein evolution speed depends on its stability and abundance and on chaperone concentrations. Proc. Natl. Acad. Sci. USA 115, 9092–9097 (2018).
pubmed: 30150386
pmcid: 6140491
doi: 10.1073/pnas.1810194115
Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl. Acad. Sci. USA 103, 5869–5874 (2006).
pubmed: 16581913
pmcid: 1458665
doi: 10.1073/pnas.0510098103
Gong, L. I., Suchard, M. A. & Bloom, J. D. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2, e00631 (2013).
pubmed: 23682315
pmcid: 3654441
doi: 10.7554/eLife.00631
Wang, B., Gallolu Kankanamalage, S., Dong, J. & Liu, Y. Optimization of therapeutic antibodies. Antib. Ther. 4, 45–54 (2021).
pubmed: 33928235
pmcid: 7944496
Stutz, C. & Blein, S. A single mutation increases heavy-chain heterodimer assembly of bispecific antibodies by inducing structural disorder in one homodimer species. J. Biol. Chem. 295, 9392–9408 (2020).
pubmed: 32404368
pmcid: 7363136
doi: 10.1074/jbc.RA119.012335
Rodríguez-Rodríguez, E. R. et al. A single mutation in framework 2 of the heavy variable domain improves the properties of a diabody and a related single-chain antibody. J. Mol. Biol. 423, 337–350 (2012).
pubmed: 22835504
doi: 10.1016/j.jmb.2012.07.007
Nikam, R., Kulandaisamy, A., Harini, K., Sharma, D. & Gromiha, M. M. ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic Acids Res. 49, D420–D424 (2021).
pubmed: 33196841
doi: 10.1093/nar/gkaa1035
Laimer, J., Hofer, H., Fritz, M., Wegenkittl, S. & Lackner, P. MAESTRO-multi agent stability prediction upon point mutations. BMC Bioinformatics 16, 116 (2015).
pubmed: 25885774
pmcid: 4403899
doi: 10.1186/s12859-015-0548-6
Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388 (2005).
pubmed: 15980494
pmcid: 1160148
doi: 10.1093/nar/gki387
Broom, A., Trainor, K., Jacobi, Z. & Meiering, E. M. Computational modeling of protein stability: quantitative analysis reveals solutions to pervasive problems. Structure 28, 717–726.e3 (2020).
pubmed: 32375024
doi: 10.1016/j.str.2020.04.003
Pucci, F., Schwersensky, M. & Rooman, M. Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr. Opin. Struct. Biol. 72, 161–168 (2022).
pubmed: 34922207
doi: 10.1016/j.sbi.2021.11.001
Savitski, M. M. et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 346, 1255784 (2014).
pubmed: 25278616
doi: 10.1126/science.1255784
Walker, E. J., Bettinger, J. Q., Welle, K. A., Hryhorenko, J. R. & Ghaemmaghami, S. Global analysis of methionine oxidation provides a census of folding stabilities for the human proteome. Proc. Natl. Acad. Sci. USA 116, 6081–6090 (2019).
pubmed: 30846556
pmcid: 6442572
doi: 10.1073/pnas.1819851116
Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).
pubmed: 28706065
pmcid: 5568797
doi: 10.1126/science.aan0693
Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).
pubmed: 31636460
pmcid: 7067682
doi: 10.1038/s41592-019-0598-1
Rao, R. et al. in Advances in Neural Information Processing Systems 32 (eds. Wallach, H. et al.) 9689–9701 (Curran Associates, 2019).
Park, C. & Marqusee, S. Pulse proteolysis: a simple method for quantitative determination of protein stability and ligand binding. Nat. Methods 2, 207–212 (2005).
pubmed: 15782190
doi: 10.1038/nmeth740
Sieber, V., Plückthun, A. & Schmid, F. X. Selecting proteins with improved stability by a phage-based method. Nat. Biotechnol. 16, 955–960 (1998).
pubmed: 9788353
doi: 10.1038/nbt1098-955
Park, C., Zhou, S., Gilmore, J. & Marqusee, S. Energetics-based protein profiling on a proteomic scale: identification of proteins resistant to proteolysis. J. Mol. Biol. 368, 1426–1437 (2007).
pubmed: 17400245
pmcid: 2857998
doi: 10.1016/j.jmb.2007.02.091
Yamaguchi, J. et al. cDNA display: a novel screening method for functional disulfide-rich peptides by solid-phase synthesis and stabilization of mRNA-protein fusions. Nucleic Acids Res. 37, e108 (2009).
pubmed: 19528071
pmcid: 2760808
doi: 10.1093/nar/gkp514
Nemoto, N., Miyamoto-Sato, E., Husimi, Y. & Yanagawa, H. In vitro virus: bonding of mRNA bearing puromycin at the 3’-terminal end to the C-terminal end of its encoded protein on the ribosome in vitro. FEBS Lett. 414, 405–408 (1997).
pubmed: 9315729
doi: 10.1016/S0014-5793(97)01026-0
Roberts, R. W. & Szostak, J. W. RNA–peptide fusions for the in vitro selection of peptides and proteins. Proc. Natl. Acad. Sci. USA 94, 12297–12302 (1997).
pubmed: 9356443
pmcid: 24913
doi: 10.1073/pnas.94.23.12297
Yourik, P., Fuchs, R. T., Mabuchi, M., Curcuru, J. L. & Robb, G. B. Staphylococcus aureus Cas9 is a multiple-turnover enzyme. RNA 25, 35–44 (2019).
pubmed: 30348755
pmcid: 6298560
doi: 10.1261/rna.067355.118
Coey, C. T. & Drohat, A. C. Kinetic methods for studying DNA glycosylases functioning in base excision repair. Methods Enzymol. 592, 357–376 (2017).
pubmed: 28668127
pmcid: 5761649
doi: 10.1016/bs.mie.2017.03.016
Nisthal, A., Wang, C. Y., Ary, M. L. & Mayo, S. L. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc. Natl. Acad. Sci. USA 116, 16367–16377 (2019).
pubmed: 31371509
pmcid: 6697890
doi: 10.1073/pnas.1903888116
Kim, T.-E. et al. Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation. Proc. Natl. Acad. Sci. USA 119, e2122676119 (2022).
pubmed: 36191185
pmcid: 9564214
doi: 10.1073/pnas.2122676119
Norn, C. et al. Protein sequence design by conformational landscape optimization. Proc. Natl. Acad. Sci. USA 118, e2017228118 (2021).
pubmed: 33712545
pmcid: 7980421
doi: 10.1073/pnas.2017228118
Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).
pubmed: 34853475
pmcid: 9293396
doi: 10.1038/s41586-021-04184-w
Horovitz, A. Double-mutant cycles: a powerful tool for analyzing protein structure and function. Fold Des. 1, R121–R126 (1996).
pubmed: 9080186
doi: 10.1016/S1359-0278(96)00056-9
Shoichet, B. K., Baase, W. A., Kuroki, R. & Matthews, B. W. A relationship between protein stability and protein function. Proc. Natl. Acad. Sci. USA 92, 452–456 (1995).
pubmed: 7831309
pmcid: 42758
doi: 10.1073/pnas.92.2.452
Meiering, E. M., Serrano, L. & Fersht, A. R. Effect of active site residues in barnase on activity and stability. J. Mol. Biol. 225, 585–589 (1992).
pubmed: 1602471
doi: 10.1016/0022-2836(92)90387-Y
Høie, M. H., Cagiada, M., Beck Frederiksen, A. H., Stein, A. & Lindorff-Larsen, K. Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation. Cell Rep. 38, 110207 (2022).
pubmed: 35021073
doi: 10.1016/j.celrep.2021.110207
Cagiada, M. et al. Discovering functionally important sites in proteins. Preprint at bioRxiv https://doi.org/10.1101/2022.07.14.500015 (2022).
Tokuriki, N. & Tawfik, D. S. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 19, 596–604 (2009).
pubmed: 19765975
doi: 10.1016/j.sbi.2009.08.003
Akashi, H. & Gojobori, T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl. Acad. Sci. USA 99, 3695–3700 (2002).
pubmed: 11904428
pmcid: 122586
doi: 10.1073/pnas.062526999
Shah, P. & Gilchrist, M. A. Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift. Proc. Natl. Acad. Sci. USA 108, 10231–10236 (2011).
pubmed: 21646514
pmcid: 3121864
doi: 10.1073/pnas.1016719108
Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).
pubmed: 30250057
pmcid: 6693876
doi: 10.1038/s41592-018-0138-4
Laine, E., Karami, Y. & Carbone, A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol. Biol. Evol. 36, 2604–2619 (2019).
pubmed: 31406981
pmcid: 6805226
doi: 10.1093/molbev/msz179
Prakash, A., Shin, J., Rajan, S. & Yoon, H. S. Structural basis of nucleic acid recognition by FK506-binding protein 25 (FKBP25), a nuclear immunophilin. Nucleic Acids Res. 44, 2909–2925 (2016).
pubmed: 26762975
pmcid: 4824100
doi: 10.1093/nar/gkw001
Goldenzweig, A. et al. Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol. Cell 63, 337–346 (2016).
pubmed: 27425410
pmcid: 4961223
doi: 10.1016/j.molcel.2016.06.012
Peleg, Y. et al. Community-wide experimental evaluation of the PROSS stability-design method. J. Mol. Biol. 433, 166964 (2021).
pubmed: 33781758
pmcid: 7610701
doi: 10.1016/j.jmb.2021.166964
Park, C. & Marqusee, S. Probing the high energy states in proteins by proteolysis. J. Mol. Biol. 343, 1467–1476 (2004).
pubmed: 15491624
doi: 10.1016/j.jmb.2004.08.085
Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 359, 343–347 (2018).
pubmed: 29301959
pmcid: 6261299
doi: 10.1126/science.aao5167
Sidore, A. M., Plesa, C., Samson, J. A., Lubock, N. B. & Kosuri, S. DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions. Nucleic Acids Res. 48, e95 (2020).
pubmed: 32692349
pmcid: 7498354
doi: 10.1093/nar/gkaa600
Basanta, B. et al. An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proc. Natl. Acad. Sci. USA 117, 22135–22145 (2020).
pubmed: 32839327
pmcid: 7486743
doi: 10.1073/pnas.2005412117
Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).
pubmed: 30209393
pmcid: 6275156
doi: 10.1038/s41586-018-0509-0
Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).
pubmed: 23135467
pmcid: 3705962
doi: 10.1038/nature11600
Huang, P.-S. et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS ONE 6, e24109 (2011).
pubmed: 21909381
pmcid: 3166072
doi: 10.1371/journal.pone.0024109
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 117, 1496–1503 (2020).
pubmed: 31896580
pmcid: 6983395
doi: 10.1073/pnas.1914677117
Hoover, D. M. & Lubkowski, J. DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res. 30, e43 (2002).
pubmed: 12000848
pmcid: 115297
doi: 10.1093/nar/30.10.e43
Arai, H., Kumachi, S. & Nemoto, N. cDNA display: a stable and simple genotype-phenotype coupling using a cell-free translation system. Methods Mol. Biol. 2070, 43–56 (2020).
pubmed: 31625089
doi: 10.1007/978-1-4939-9853-1_3
Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina paired-end read merger. Bioinformatics 30, 614–620 (2014).
pubmed: 24142950
doi: 10.1093/bioinformatics/btt593
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).
doi: 10.14806/ej.17.1.200
Phan, D., Pradhan, N. & Jankowiak, M. Composable effects for flexible and accelerated probabilistic programming in NumPyro. Preprint at https://arxiv.org/abs/1912.11554 (2019).
Sato, S., Cho, J.-H., Peran, I., Soydaner-Azeloglu, R. G. & Raleigh, D. P. The N-terminal domain of ribosomal protein L9 folds via a diffuse and delocalized transition state. Biophys. J. 112, 1797–1806 (2017).
pubmed: 28494951
pmcid: 5425357
doi: 10.1016/j.bpj.2017.01.034
Dodson, C. A. & Arbely, E. Protein folding of the SAP domain, a naturally occurring two-helix bundle. FEBS Lett. 589, 1740–1747 (2015).
pubmed: 26073259
pmcid: 4509717
doi: 10.1016/j.febslet.2015.06.002
Jäger, M., Dendle, M. & Kelly, J. W. Sequence determinants of thermodynamic stability in a WW domain-an all-beta-sheet protein. Protein Sci. 18, 1806–1813 (2009).
pubmed: 19565466
pmcid: 2776968
doi: 10.1002/pro.172
Jiang, X., Kowalski, J. & Kelly, J. W. Increasing protein stability using a rational approach combining sequence homology and structural alignment: stabilizing the WW domain. Protein Sci. 10, 1454–1465 (2001).
pubmed: 11420447
pmcid: 2374112
doi: 10.1110/ps.640101
Araya, C. L. et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl. Acad. Sci. USA 109, 16858–16863 (2012).
pubmed: 23035249
pmcid: 3479514
doi: 10.1073/pnas.1209751109
Xiao, S. et al. Rational modification of protein stability by targeting surface sites leads to complicated results. Proc. Natl. Acad. Sci. USA 110, 11337–11342 (2013).
pubmed: 23798426
pmcid: 3710877
doi: 10.1073/pnas.1222245110
Xiao, S., Bi, Y., Shan, B. & Raleigh, D. P. Analysis of core packing in a cooperatively folded miniature protein: the ultrafast folding villin headpiece helical subdomain. Biochemistry 48, 4607–4616 (2009).
pubmed: 19354264
doi: 10.1021/bi8021763
Neuweiler, H. et al. The folding mechanism of BBL: plasticity of transition-state structure observed within an ultrafast folding protein family. J. Mol. Biol. 390, 1060–1073 (2009).
pubmed: 19445954
doi: 10.1016/j.jmb.2009.05.011
Jemth, P. et al. The structure of the major transition state for folding of an FF domain from experiment and simulation. J. Mol. Biol. 350, 363–378 (2005).
pubmed: 15935381
doi: 10.1016/j.jmb.2005.04.067
Villegas, V., Martínez, J. C., Avilés, F. X. & Serrano, L. Structure of the transition state in the folding process of human procarboxypeptidase A2 activation domain. J. Mol. Biol. 283, 1027–1036 (1998).
pubmed: 9799641
doi: 10.1006/jmbi.1998.2158
Maxwell, K. L. & Davidson, A. R. Mutagenesis of a buried polar interaction in an SH3 domain: sequence conservation provides the best prediction of stability effects. Biochemistry 37, 16172–16182 (1998).
pubmed: 9819209
doi: 10.1021/bi981788p
Northey, J. G. B., Maxwell, K. L. & Davidson, A. R. Protein folding kinetics beyond the phi value: using multiple amino acid substitutions to investigate the structure of the SH3 domain folding transition state. J. Mol. Biol. 320, 389–402 (2002).
pubmed: 12079394
doi: 10.1016/S0022-2836(02)00445-X
de los Rios, M. A., Daneshi, M. & Plaxco, K. W. Experimental investigation of the frequency and substitution dependence of negative phi-values in two-state proteins. Biochemistry 44, 12160–12167 (2005).
doi: 10.1021/bi0505621
Hamelryck, T. & Manderick, B. PDB file parser and structure class implemented in Python. Bioinformatics 19, 2308–2310 (2003).
pubmed: 14630660
doi: 10.1093/bioinformatics/btg299
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
pubmed: 19304878
pmcid: 2682512
doi: 10.1093/bioinformatics/btp163
Joosten, R. P. et al. A series of PDB related databases for everyday needs. Nucleic Acids Res. 39, D411–D419 (2011).
pubmed: 21071423
doi: 10.1093/nar/gkq1105
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
pubmed: 6667333
doi: 10.1002/bip.360221211
Zheng, F., Zhang, J. & Grigoryan, G. Tertiary structural propensities reveal fundamental sequence/structure relationships. Structure 23, 961–971 (2015).
pubmed: 25914055
doi: 10.1016/j.str.2015.03.015
Zheng, F. & Grigoryan, G. Sequence statistics of tertiary structural motifs reflect protein stability. PLoS ONE 12, e0178272 (2017).
pubmed: 28552940
pmcid: 5446159
doi: 10.1371/journal.pone.0178272
Johnson, L. S., Eddy, S. R. & Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinf. 11, 431 (2010).
doi: 10.1186/1471-2105-11-431
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
pubmed: 22039361
pmcid: 3197634
doi: 10.1371/journal.pcbi.1002195
Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).
pubmed: 25398609
doi: 10.1093/bioinformatics/btu739
Hopf, T. A. et al. The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics 35, 1582–1584 (2019).
pubmed: 30304492
doi: 10.1093/bioinformatics/bty862
Pan, Y. et al. Quantitative proteomics reveals the kinetics of trypsin-catalyzed protein digestion. Anal. Bioanal. Chem. 406, 6247–6256 (2014).
Schellenberger, V., Braune, K., Hofmann, H. J. & Jakubke, H. D. The specificity of chymotrypsin. A statistical analysis of hydrolysis data. Eur. J. Biochem. 199, 623–636 (1991).
Schellenberger, V., Turck, C. W., Hedstrom, L. & Rutter, W. J. Mapping the S’ subsites of serine proteases using acyl transfer to mixtures of peptide nucleophiles. Biochemistry 32, 4349–4353 (1993).
Schellenberger, V., Turck, C. W. & Rutter, W. J. Role of the S’ subsites in serine protease catalysis. Active-site mapping of rat chymotrypsin, rat trypsin, alpha-lytic protease, and cercarial protease from Schistosoma mansoni. Biochemistry 33, 4251–4257 (1994).
Monera, O. D., Sereda, T. J., Zhou, N. E., Kay, C. M. & Hodges, R. S. Relationship of sidechain hydrophobicity and alpha-helical propensity on the stability of the single-stranded amphipathic alpha-helix. J. Pept. Sci. 1, 319–329 (1995).