Learning Strategies in Protein Directed Evolution.
Artificial intelligence
Deep learning
Directed evolution
Epistasis
Hotspots
Machine learning
Protein engineering
Rational design
Saturation mutagenesis
Synthetic biology
Journal
Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969
Informations de publication
Date de publication:
2022
2022
Historique:
entrez:
21
6
2022
pubmed:
22
6
2022
medline:
24
6
2022
Statut:
ppublish
Résumé
Synthetic biology is a fast-evolving research field that combines biology and engineering principles to develop new biological systems for medical, pharmacological, and industrial applications. Synthetic biologists use iterative "design, build, test, and learn" cycles to efficiently engineer genetic systems that are reliable, reproducible, and predictable. Protein engineering by directed evolution can benefit from such a systematic engineering approach for various reasons. Learning can be carried out before starting, throughout or after finalizing a directed evolution project. Computational tools, bioinformatics, and scanning mutagenesis methods can be excellent starting points, while molecular dynamics simulations and other strategies can guide engineering efforts. Similarly, studying protein intermediates along evolutionary pathways offers fascinating insights into the molecular mechanisms shaped by evolution. The learning step of the cycle is not only crucial for proteins or enzymes that are not suitable for high-throughput screening or selection systems, but it is also valuable for any platform that can generate a large amount of data that can be aided by machine learning algorithms. The main challenge in protein engineering is to predict the effect of a single mutation on one functional parameter-to say nothing of several mutations on multiple parameters. This is largely due to nonadditive mutational interactions, known as epistatic effects-beneficial mutations present in a genetic background may not be beneficial in another genetic background. In this work, we provide an overview of experimental and computational strategies that can guide the user to learn protein function at different stages in a directed evolution project. We also discuss how epistatic effects can influence the success of directed evolution projects. Since machine learning is gaining momentum in protein engineering and the field is becoming more interdisciplinary thanks to collaboration between mathematicians, computational scientists, engineers, molecular biologists, and chemists, we provide a general workflow that familiarizes nonexperts with the basic concepts, dataset requirements, learning approaches, model capabilities and performance metrics of this intriguing area. Finally, we also provide some practical recommendations on how machine learning can harness epistatic effects for engineering proteins in an "outside-the-box" way.
Identifiants
pubmed: 35727454
doi: 10.1007/978-1-0716-2152-3_15
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Review
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
225-275Commentaires et corrections
Type : ErratumIn
Informations de copyright
© 2022. Springer Science+Business Media, LLC, part of Springer Nature.
Références
Arnold FH (2018) Directed evolution: bringing new chemistry to life. Angew Chem Int Ed 57(16):4143–4148. https://doi.org/10.1002/anie.201708408
doi: 10.1002/anie.201708408
Reetz MT (2016) Directed evolution of selective enzymes. Wiley-VCH Verlag GmbH & Co KGaA, Weinheim
doi: 10.1002/9783527655465
Zeymer C, Hilvert D (2018) Directed evolution of protein catalysts. Annu Rev Biochem 87:131–157. https://doi.org/10.1146/annurev-biochem-062917-012034
doi: 10.1146/annurev-biochem-062917-012034
pubmed: 29494241
Trudeau DL, Tawfik DS (2019) Protein engineers turned evolutionists—the quest for the optimal starting point. Curr Opin Biotechnol 60:46–52. https://doi.org/10.1016/j.copbio.2018.12.002
doi: 10.1016/j.copbio.2018.12.002
pubmed: 30611116
Sachsenhauser V, Bardwell JC (2018) Directed evolution to improve protein folding in vivo. Curr Opin Struct Biol 48:117–123. https://doi.org/10.1016/j.sbi.2017.12.003
doi: 10.1016/j.sbi.2017.12.003
pubmed: 29278775
Rodriguez EA, Campbell RE, Lin JY et al (2017) The growing and glowing toolbox of fluorescent and photoactive proteins. Trends Biochem Sci 42(2):111–129. https://doi.org/10.1016/j.tibs.2016.09.010
doi: 10.1016/j.tibs.2016.09.010
pubmed: 27814948
Tizei PAG, Csibra E, Torres L, Pinheiro VB (2016) Selection platforms for directed evolution in synthetic biology. Biochem Soc Trans 44(4):1165–1175. https://doi.org/10.1042/BST20160076
doi: 10.1042/BST20160076
pubmed: 27528765
pmcid: 4984445
Liu R, Liang L, Freed EF, Gill RT (2020) Directed evolution of CRISPR/Cas systems for precise gene editing. Trends Biotechnol 39(3):262–273. https://doi.org/10.1016/j.tibtech.2020.07.005
doi: 10.1016/j.tibtech.2020.07.005
pubmed: 32828556
Packer MS, Liu DR (2015) Methods for the directed evolution of proteins. Nat Rev Genet 16(7):379–394. https://doi.org/10.1038/nrg3927
doi: 10.1038/nrg3927
pubmed: 26055155
Molina-Espeja P, Viña-Gonzalez J, Gomez-Fernandez BJ et al (2016) Beyond the outer limits of nature by directed evolution. Biotechnol Adv 34(5):754–767. https://doi.org/10.1016/j.biotechadv.2016.03.008
doi: 10.1016/j.biotechadv.2016.03.008
pubmed: 27064127
Samish I (2017) The framework of computational protein design. Methods Mol Biol 1529:1–17. https://doi.org/10.1007/978-1-4939-6637-0_1
doi: 10.1007/978-1-4939-6637-0_1
Romero PA, Arnold FH (2009) Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 10(12):866–876. https://doi.org/10.1038/nrm2805
doi: 10.1038/nrm2805
pubmed: 19935669
pmcid: 2997618
Arnold FH (2019) Innovation by evolution: bringing new chemistry to life (Nobel lecture). Angew Chem Int Ed 58(41):14420–14426. https://doi.org/10.1002/anie.201907729
doi: 10.1002/anie.201907729
Bornscheuer UT, Hauer B, Jaeger KE, Schwaneberg U (2019) Directed evolution empowered redesign of natural proteins for the sustainable production of chemicals and pharmaceuticals. Angew Chem Int Ed 58(1):36–40. https://doi.org/10.1002/anie.201812717
doi: 10.1002/anie.201812717
Truppo MD (2017) Biocatalysis in the pharmaceutical industry: the need for speed. ACS Med Chem Lett 8(5):476–480. https://doi.org/10.1021/acsmedchemlett.7b00114
doi: 10.1021/acsmedchemlett.7b00114
pubmed: 28523096
pmcid: 5430392
Fasim A, More VS, More SS (2021) Large-scale production of enzymes for biotechnology uses. Curr Opin Biotechnol 69:68–76. https://doi.org/10.1016/j.copbio.2020.12.002
doi: 10.1016/j.copbio.2020.12.002
pubmed: 33388493
Wu S, Snajdrova R, Moore JC et al (2021) Biocatalysis: enzymatic synthesis for industrial applications. Angew Chem Int Ed 60(1):88–119. https://doi.org/10.1002/anie.202006648
doi: 10.1002/anie.202006648
Heckmann CM, Paradisi F (2020) Looking back: a short history of the discovery of enzymes and how they became powerful chemical tools. ChemCatChem 12(24):6082–6102. https://doi.org/10.1002/cctc.202001107
doi: 10.1002/cctc.202001107
pubmed: 33381242
pmcid: 7756376
Abdelraheem EMM, Busch H, Hanefeld U, Tonin F (2019) Biocatalysis explained: from pharmaceutical to bulk chemical production. React Chem Eng 4(11):1878–1894. https://doi.org/10.1039/c9re00301k
doi: 10.1039/c9re00301k
Dvořák P, Nikel PI, Damborský J, de Lorenzo V (2017) Bioremediation 3.0: engineering pollutant-removing bacteria in the times of systemic biology. Biotechnol Adv 35(7):845–866. https://doi.org/10.1016/j.biotechadv.2017.08.001
doi: 10.1016/j.biotechadv.2017.08.001
pubmed: 28789939
Bernhardsgrütter I, Stoffel GM, Miller TE, Erb TJ (2021) CO2-converting enzymes for sustainable biotechnology: from mechanisms to application. Curr Opin Biotechnol 67:80–87. https://doi.org/10.1016/j.copbio.2021.01.003
doi: 10.1016/j.copbio.2021.01.003
pubmed: 33508634
Wei R, Tiso T, Bertling J et al (2020) Possibilities and limitations of biotechnological plastic degradation and recycling. Nat Catal 3(11):867–871. https://doi.org/10.1038/s41929-020-00521-w
doi: 10.1038/s41929-020-00521-w
Woodley JM (2019) Accelerating the implementation of biocatalysis in industry. Appl Microbiol Biotechnol 103(12):4733–4739. https://doi.org/10.1007/s00253-019-09796-x
doi: 10.1007/s00253-019-09796-x
pubmed: 31049622
Hauer B (2020) Embracing Nature’s catalysts: a viewpoint on the future of biocatalysis. ACS Catal 10(15):8418–8427. https://doi.org/10.1021/acscatal.0c01708
doi: 10.1021/acscatal.0c01708
Wong TS, Tee KL (2020) A practical guide to protein engineering. Springer International Publishing, Cham
doi: 10.1007/978-3-030-56898-6
Cameron DE, Bashor CJ, Collins JJ (2014) A brief history of synthetic biology. Nat Rev Microbiol 12(5):381–390. https://doi.org/10.1038/nrmicro3239
doi: 10.1038/nrmicro3239
pubmed: 24686414
Nielsen J, Keasling JD (2016) Engineering cellular metabolism. Cell 164(6):1185–1197. https://doi.org/10.1016/j.cell.2016.02.004
doi: 10.1016/j.cell.2016.02.004
pubmed: 26967285
Opgenorth P, Costello Z, Okada T et al (2019) Lessons from two design-build-test-learn cycles of dodecanol production in Escherichia coli aided by machine learning. ACS Synth Biol 8(6):1337–1351. https://doi.org/10.1021/acssynbio.9b00020
doi: 10.1021/acssynbio.9b00020
pubmed: 31072100
Carbonell P, Jervis AJ, Robinson CJ et al (2018) An automated design-build-test-learn pipeline for enhanced microbial production of fine chemicals. Commun Biol 1(1):66. https://doi.org/10.1038/s42003-018-0076-9
doi: 10.1038/s42003-018-0076-9
pubmed: 30271948
pmcid: 6123781
Mate DM, Gonzalez-Perez D, Mateljak I et al (2017) The pocket manual of directed evolution: tips and tricks. In: Brahmachari G (ed) Biotechnology of microbial enzymes: production, biocatalysis and industrial applications. Elsevier Inc, Philadelphia, PA
Sayous V, Lubrano P, Li Y (1868) Acevedo-Rocha CG (2020) Unbiased libraries in protein directed evolution. Biochim Biophys Acta, Proteins Proteomics 2:140321. https://doi.org/10.1016/j.bbapap.2019.140321
doi: 10.1016/j.bbapap.2019.140321
Firth AE, Patrick WM (2008) GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries. Nucleic Acids Res 36(Web Server Issue):W281–W285. https://doi.org/10.1093/nar/gkn226
doi: 10.1093/nar/gkn226
pubmed: 18442989
pmcid: 2447733
Denault M, Pelletier JN (2007) Protein library design and screening: working out the probabilities. Protein Eng Protoc 352:127–154
doi: 10.1385/1-59745-187-8:127
Nov Y (2012) When second best is good enough: another probabilistic look at saturation mutagenesis. Appl Environ Microbiol 78(1):258–262. https://doi.org/10.1128/AEM.06265-11
doi: 10.1128/AEM.06265-11
pubmed: 22038607
pmcid: 3255629
Hoebenreich S, Zilly FE, Acevedo-Rocha CG et al (2015) Speeding up directed evolution: combining the advantages of solid-phase combinatorial gene synthesis with statistically guided reduction of screening effort. ACS Synth Biol 4(3):317–331. https://doi.org/10.1021/sb5002399
doi: 10.1021/sb5002399
pubmed: 24921161
Li A, Qu G, Sun Z, Reetz MT (2019) Statistical analysis of the benefits of focused saturation mutagenesis in directed evolution based on reduced amino acid alphabets. ACS Catal 9(9):7769–7778. https://doi.org/10.1021/acscatal.9b02548
doi: 10.1021/acscatal.9b02548
Tee KL, Wong TS (2013) Polishing the craft of genetic diversity creation in directed evolution. Biotechnol Adv 31:1707–1721
doi: 10.1016/j.biotechadv.2013.08.021
pubmed: 24012599
Li A, Acevedo-Rocha CG, Sun Z et al (2018) Beating bias in the directed evolution of proteins: combining high-fidelity on-chip solid-phase gene synthesis with efficient gene assembly for combinatorial library construction. ChemBioChem 19(3):221–228. https://doi.org/10.1002/cbic.201700540
doi: 10.1002/cbic.201700540
pubmed: 29171900
She W, Ni J, Shui K et al (2018) Rapid and error-free site-directed mutagenesis by a PCR-free in vitro CRISPR/Cas9-mediated mutagenic system. ACS Synth Biol 7(9):2236–2244. https://doi.org/10.1021/acssynbio.8b00245
doi: 10.1021/acssynbio.8b00245
pubmed: 30075075
Ferla MP (2016) Mutanalyst, an online tool for assessing the mutational spectrum of epPCR libraries with poor sampling. BMC Bioinformatics 17(1):152. https://doi.org/10.1186/s12859-016-0996-7
doi: 10.1186/s12859-016-0996-7
pubmed: 27044645
pmcid: 4820924
Hanson-Manful P, Patrick WM (2013) Construction and analysis of randomized protein-encoding libraries using error-prone PCR. Methods Mol Biol 996:251–267. https://doi.org/10.1007/978-1-62703-354-1_15
doi: 10.1007/978-1-62703-354-1_15
pubmed: 23504429
Acevedo-Rocha CG, Ferla M, Reetz MT (2018) Directed evolution of proteins based on mutational scanning. In: Bornscheuer U, Höhne M (eds) Protein engineering. Methods in molecular biology. Humana Press Inc, New York, NY
Sullivan B, Walton AZ, Stewart JD (2013) Library construction and evaluation for site saturation mutagenesis. Enzym Microb Technol 53(1):70–77. https://doi.org/10.1016/j.enzmictec.2013.02.012
doi: 10.1016/j.enzmictec.2013.02.012
Acevedo-Rocha CG, Reetz MT, Nov Y (2015) Economical analysis of saturation mutagenesis experiments. Sci Rep 5:10654. https://doi.org/10.1038/srep10654
doi: 10.1038/srep10654
pubmed: 26190439
pmcid: 4507136
Pourmir A, Johannes TW (2012) Directed evolution: selection of the host organism. Comput Struct Biotechnol J 2:e201209012. https://doi.org/10.5936/csbj.201209012
doi: 10.5936/csbj.201209012
pubmed: 24688653
pmcid: 3962113
Gonzalez-Perez D, Garcia-Ruiz E, Alcalde M (2012) Saccharomyces cerevisiae in directed evolution: an efficient tool to improve enzymes. Bioeng Bugs 3(3):172–177. https://doi.org/10.4161/bbug.19544
doi: 10.4161/bbug.19544
pubmed: 22572788
pmcid: 3370936
Feránndez L, Jiao N, Soni P et al (2010) An efficient method for mutant library creation in Pichia pastoris useful in directed evolution. Biocatal Biotransforma 28(2):122–129. https://doi.org/10.3109/10242420903505834
doi: 10.3109/10242420903505834
Boersma YL, Dröge MJ, Quax WJ (2007) Selection strategies for improved biocatalysts. FEBS J 274(9):2181–2195. https://doi.org/10.1111/j.1742-4658.2007.05782.x
doi: 10.1111/j.1742-4658.2007.05782.x
pubmed: 17448143
Fox RJ, Davis SC, Mundorff EC et al (2007) Improving catalytic function by ProSAR-driven enzyme evolution. Nat Biotechnol 25(3):338–344. https://doi.org/10.1038/nbt1286
doi: 10.1038/nbt1286
pubmed: 17322872
Yang KK, Wu Z, Arnold FH (2019) Machine-learning-guided directed evolution for protein engineering. Nat Methods 16(8):687–694. https://doi.org/10.1038/s41592-019-0496-6
doi: 10.1038/s41592-019-0496-6
pubmed: 31308553
Xiao H, Bao Z, Zhao H (2014) High throughput screening and selection methods for directed enzyme evolution. Ind Eng Chem Res 54(16):4011–4020. https://doi.org/10.1021/ie503060a
doi: 10.1021/ie503060a
pubmed: 26074668
pmcid: 4461044
Markel U, Essani KD, Besirlioglu V et al (2020) Advances in ultrahigh-throughput screening for directed enzyme evolution. Chem Soc Rev 49(1):233–262. https://doi.org/10.1039/c8cs00981c
doi: 10.1039/c8cs00981c
pubmed: 31815263
Sheludko YV, Fessner WD (2020) Winning the numbers game in enzyme evolution—fast screening methods for improved biotechnology proteins. Curr Opin Struct Biol 63:123–133. https://doi.org/10.1016/j.sbi.2020.05.003
doi: 10.1016/j.sbi.2020.05.003
pubmed: 32615371
Stucki A, Vallapurackal J, Ward TR, Dittrich PS (2021) Droplet microfluidics and directed evolution of enzymes: an intertwined journey. Angew Chem Int Ed 60:24368. https://doi.org/10.1002/ange.202016154
doi: 10.1002/ange.202016154
Ravikumar A, Arzumanyan GA, Obadi MKA et al (2018) Scalable, continuous evolution of genes at mutation rates above genomic error thresholds. Cell 175(7):1946–1957.e13. https://doi.org/10.1016/j.cell.2018.10.021
doi: 10.1016/j.cell.2018.10.021
pubmed: 30415839
pmcid: 6343851
Morrison MS, Podracky CJ, Liu DR (2020) The developing toolkit of continuous directed evolution. Nat Chem Biol 16(6):610–619. https://doi.org/10.1038/s41589-020-0532-y
doi: 10.1038/s41589-020-0532-y
pubmed: 32444838
Acevedo-Rocha CG, Agudo R, Reetz MT (2014) Directed evolution of stereoselective enzymes based on genetic selection as opposed to screening systems. J Biotechnol 191:3–10. https://doi.org/10.1016/j.jbiotec.2014.04.009
doi: 10.1016/j.jbiotec.2014.04.009
pubmed: 24786824
Qu G, Li A, Acevedo-Rocha CG et al (2020) The crucial role of methodology development in directed evolution of selective enzymes. Angew Chem Int Ed 59(32):13204–13231. https://doi.org/10.1002/anie.201901491
doi: 10.1002/anie.201901491
Acevedo-Rocha CG, Hollmann F, Sanchis J, Sun Z (2020) A pioneering career in catalysis: Manfred T. Reetz. ACS Catal 10(24):15123–15139. https://doi.org/10.1021/acscatal.0c04108
doi: 10.1021/acscatal.0c04108
Reetz MT, Kahakeaw D, Lohmer R (2008) Addressing the numbers problem in directed evolution. ChemBioChem 9(11):1797–1804. https://doi.org/10.1002/cbic.200800298
doi: 10.1002/cbic.200800298
pubmed: 18567049
Acevedo-Rocha CG, Reetz MT (2016) Handling the numbers problem in directed evolution. In: Svendsen AS (ed) Understanding enzymes; function, design, engineering and analysis. Jenny Stanford Publishing, Singapore
Currin A, Swainston N, Day PJ, Kell DB (2015) Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 44(5):1172–1239. https://doi.org/10.1039/c4cs00351a
doi: 10.1039/c4cs00351a
pubmed: 25503938
Li G, Qin Y, Fontaine NT et al (2021) Machine learning enables selection of epistatic enzyme mutants for stability against unfolding and detrimental aggregation. ChemBioChem 22(5):904–914. https://doi.org/10.1002/cbic.202000612
doi: 10.1002/cbic.202000612
pubmed: 33094545
Biswas S, Khimulya G, Alley EC et al (2021) Low-N protein engineering with data-efficient deep learning. Nat Methods 18(4):389–396. https://doi.org/10.1038/s41592-021-01100-y
doi: 10.1038/s41592-021-01100-y
pubmed: 33828272
Xu Y, Verma D, Sheridan RP et al (2020) Deep dive into machine learning models for protein engineering. J Chem Inf Model 60(6):2773–2790
doi: 10.1021/acs.jcim.0c00073
pubmed: 32250622
Bedbrook CN, Yang KK, Rice AJ et al (2017) Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization. PLoS Comput Biol 13(10):e1005786
doi: 10.1371/journal.pcbi.1005786
pubmed: 29059183
pmcid: 5695628
Romero PA, Krause A, Arnold FH (2013) Navigating the protein fitness landscape with Gaussian processes. Proc Natl Acad Sci U S A 110(3):e193. https://doi.org/10.1073/pnas.1215251110
doi: 10.1073/pnas.1215251110
pubmed: 23277561
Repecka D, Jauniskis V, Karpus L et al (2021) Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell 3(4):324–333. https://doi.org/10.1038/s42256-021-00310-5
doi: 10.1038/s42256-021-00310-5
Saito Y, Oikawa M, Nakazawa H et al (2018) Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins. ACS Synth Biol 7(9):2014–2022. https://doi.org/10.1021/acssynbio.8b00155
doi: 10.1021/acssynbio.8b00155
pubmed: 30103599
Bedbrook CN, Yang KK, Robinson JE et al (2019) Machine learning-guided channel rhodopsin engineering enables minimally invasive optogenetics. Nat Methods 16(11):1176–1184. https://doi.org/10.1038/s41592-019-0583-8
doi: 10.1038/s41592-019-0583-8
pubmed: 31611694
pmcid: 6858556
Biswas S, Khimulya G, Alley EC et al (2020) Low-N protein engineering with data-efficient deep learning. bioRxiv. https://doi.org/10.1101/2020.01.23.917682
Cadet F, Fontaine N, Li G et al (2018) A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes. Sci Rep 8(1):1–15. https://doi.org/10.1038/s41598-018-35033-y
doi: 10.1038/s41598-018-35033-y
Riesselman AJ, Ingraham JB, Marks DS (2018) Deep generative models of genetic variation capture the effects of mutations. Nat Methods 15(10):816–822. https://doi.org/10.1038/s41592-018-0138-4
doi: 10.1038/s41592-018-0138-4
pubmed: 30250057
pmcid: 6693876
Ogden PJ, Kelsic ED, Sinai S, Church GM (2019) Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science 366(6469):1139–1143. https://doi.org/10.1126/science.aaw2900
doi: 10.1126/science.aaw2900
pubmed: 31780559
pmcid: 7197022
Liao J, Warmuth MK, Govindarajan S et al (2007) Engineering proteinase K using machine learning and synthetic genes. BMC Biotechnol 7(1):16. https://doi.org/10.1186/1472-6750-7-16
doi: 10.1186/1472-6750-7-16
pubmed: 17386103
pmcid: 1847811
Wu Z, Yang KK, Liszka MJ et al (2020) Signal peptides generated by attention-based neural networks. ACS Synth Biol 9(8):2154–2161. https://doi.org/10.1021/acssynbio.0c00219
doi: 10.1021/acssynbio.0c00219
pubmed: 32649182
Alley EC, Khimulya G, Biswas S et al (2019) Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 16(12):1315–1322. https://doi.org/10.1038/s41592-019-0598-1
doi: 10.1038/s41592-019-0598-1
pubmed: 31636460
pmcid: 7067682
Cadet F, Fontaine N, Vetrivel I et al (2018) Application of fourier transform and proteochemometrics principles to protein engineering. BMC Bioinformatics 19(1):382. https://doi.org/10.1186/s12859-018-2407-8
doi: 10.1186/s12859-018-2407-8
pubmed: 30326841
pmcid: 6191906
Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584
doi: 10.1093/nar/30.7.1575
pubmed: 11917018
pmcid: 101833
Thompson MC, Barad BA, Wolff AM et al (2019) Temperature-jump solution X-ray scattering reveals distinct motions in a dynamic enzyme. Nat Chem 11(11):1058–1066. https://doi.org/10.1038/s41557-019-0329-3
doi: 10.1038/s41557-019-0329-3
pubmed: 31527847
pmcid: 6815256
Van Den Bedem H, Fraser JS (2015) Integrative, dynamic structural biology at atomic resolution - it’s about time. Nat Methods 12:307–318
doi: 10.1038/nmeth.3324
pubmed: 25825836
pmcid: 4457290
Planas-Iglesias J, Marques SM, Pinto GP et al (2021) Computational design of enzymes for biotechnological applications. Biotechnol Adv 47:107696. https://doi.org/10.1016/j.biotechadv.2021.107696
doi: 10.1016/j.biotechadv.2021.107696
pubmed: 33513434
Kiss G, Çelebi-Ölçüm N, Moretti R et al (2013) Computational enzyme design. Angew Chem Int Ed 52(22):5700–5725. https://doi.org/10.1002/anie.201204077
doi: 10.1002/anie.201204077
Ruiz-Carmona S, Schmidtke P, Luque FJ et al (2017) Dynamic undocking and the quasi-bound state as tools for drug discovery. Nat Chem 9(3):201–206. https://doi.org/10.1038/nchem.2660
doi: 10.1038/nchem.2660
pubmed: 28221352
Leman JK, Weitzner BD, Lewis SM et al (2020) Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat Methods 17(7):665–680. https://doi.org/10.1038/s41592-020-0848-2
doi: 10.1038/s41592-020-0848-2
pubmed: 32483333
Waterhouse A, Bertoni M, Bienert S et al (2018) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46(W1):W296–W303. https://doi.org/10.1093/nar/gky427
doi: 10.1093/nar/gky427
pubmed: 29788355
pmcid: 6030848
Kelley LA, Mezulis S, Yates CM et al (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10(6):845–858. https://doi.org/10.1038/nprot.2015.053
doi: 10.1038/nprot.2015.053
pubmed: 25950237
pmcid: 5298202
Yang J, Yan R, Roy A et al (2014) The I-TASSER suite: protein structure and function prediction. Nat Methods 12(1):7–8. https://doi.org/10.1038/nmeth.3213
doi: 10.1038/nmeth.3213
Yang G, Miton CM, Tokuriki N (2020) A mechanistic view of enzyme evolution. Protein Sci 29(8):1724–1747. https://doi.org/10.1002/pro.3901
doi: 10.1002/pro.3901
pubmed: 32557882
pmcid: 7380680
Osuna S (2020) The challenge of predicting distal active site mutations in computational enzyme design. WIREs Comput Mol Sci 11(3):e1502. https://doi.org/10.1002/wcms.1502
doi: 10.1002/wcms.1502
Crean RM, Gardner JM, Kamerlin SCL (2020) Harnessing conformational plasticity to generate designer enzymes. J Am Chem Soc 142(26):11324–11342. https://doi.org/10.1021/jacs.0c04924
doi: 10.1021/jacs.0c04924
pubmed: 32496764
pmcid: 7467679
Nett N, Duewel S, Richter AA, Hoebenreich S (2017) Revealing additional stereocomplementary pairs of old yellow enzymes by rational transfer of engineered residues. ChemBioChem 18(7):685–691. https://doi.org/10.1002/cbic.201600688
doi: 10.1002/cbic.201600688
pubmed: 28107586
Toogood HS, Scrutton NS (2018) Discovery, characterization, engineering, and applications of ene-reductases for industrial biocatalysis. ACS Catal 8(4):3532–3549. https://doi.org/10.1021/acscatal.8b00624
doi: 10.1021/acscatal.8b00624
pubmed: 31157123
pmcid: 6542678
Burley SK, Berman HM, Bhikadiya C et al (2019) Protein data bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 47(D1):D520–D528. https://doi.org/10.1093/nar/gky949
doi: 10.1093/nar/gky949
Bateman A (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515. https://doi.org/10.1093/nar/gky1049
doi: 10.1093/nar/gky1049
Chang A, Jeske L, Ulbrich S et al (2021) BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res 49(D1):D498–D508. https://doi.org/10.1093/nar/gkaa1025
doi: 10.1093/nar/gkaa1025
pubmed: 33211880
Finnigan W, Hepworth LJ, Flitsch SL, Turner NJ (2021) RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades. Nat Catal 4(2):98–104. https://doi.org/10.1038/s41929-020-00556-z
doi: 10.1038/s41929-020-00556-z
pubmed: 33604511
pmcid: 7116764
Bava KA, Gromiha MM, Uedaira H et al (2004) ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Res 32(Suppl 1):D120–D121. https://doi.org/10.1093/nar/gkh082
doi: 10.1093/nar/gkh082
pubmed: 14681373
pmcid: 308816
Kawabata T, Ota M, Nishikawa K (1999) The protein mutant database. Nucleic Acids Res 27:355–357
doi: 10.1093/nar/27.1.355
pubmed: 9847227
pmcid: 148182
Wang CY, Chang PM, Ary ML et al (2018) ProtaBank: a repository for protein design and engineering data. Protein Sci 27(6):1113–1124. https://doi.org/10.1002/pro.3406
doi: 10.1002/pro.3406
pubmed: 29575358
pmcid: 5980626
Mazurenko S, Prokop Z, Damborsky J (2020) Machine learning in enzyme engineering. ACS Catal 10(2):1210–1223. https://doi.org/10.1021/acscatal.9b04321
doi: 10.1021/acscatal.9b04321
Stourac J, Dubrava J, Musil M et al (2021) FireProtDB: database of manually curated protein stability data. Nucleic Acids Res 49(D1):D319–D324. https://doi.org/10.1093/nar/gkaa981
doi: 10.1093/nar/gkaa981
pubmed: 33166383
Acevedo-Rocha CG, Hoebenreich S, Reetz MT (2014) Iterative saturation mutagenesis: a powerful approach to engineer proteins by systematically simulating Darwinian evolution. Methods Mol Biol 1179:103–128. https://doi.org/10.1007/978-1-4939-1053-3_7
doi: 10.1007/978-1-4939-1053-3_7
pubmed: 25055773
Reetz MT, Carballeira JD (2007) Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nat Protoc 2(4):891–903. https://doi.org/10.1038/nprot.2007.72
doi: 10.1038/nprot.2007.72
pubmed: 17446890
Goldenzweig A, Goldsmith M, Hill SE et al (2016) Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol Cell 63(2):337–346. https://doi.org/10.1016/j.molcel.2016.06.012
doi: 10.1016/j.molcel.2016.06.012
pubmed: 27425410
pmcid: 4961223
Musil M, Konegger H, Hon J et al (2019) Computational design of stable and soluble biocatalysts. ACS Catal 9(2):1033–1054. https://doi.org/10.1021/acscatal.8b03613
doi: 10.1021/acscatal.8b03613
Gora A, Brezovsky J, Damborsky J (2013) Gates of enzymes. Chem Rev 113(8):5871–5923. https://doi.org/10.1021/cr300384w
doi: 10.1021/cr300384w
pubmed: 23617803
pmcid: 3744840
Sequeiros-Borja CE, Surpeta B, Brezovsky J (2021) Recent advances in user-friendly computational tools to engineer protein function. Brief Bioinform 22(3):1–15. https://doi.org/10.1093/bib/bbaa150
doi: 10.1093/bib/bbaa150
Ashkenazy H, Erez E, Martz E et al (2010) ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res 38(Web Server Issue):W529–W533. https://doi.org/10.1093/nar/gkq399
doi: 10.1093/nar/gkq399
pubmed: 20478830
pmcid: 2896094
Kourist R, Jochens H, Bartsch S et al (2010) The α/β-hydrolase fold 3DM database (ABHDB) as a tool for protein engineering. ChemBioChem 11:1635–1643
doi: 10.1002/cbic.201000213
pubmed: 20593436
Sumbalova L, Stourac J, Martinek T et al (2018) HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res 46(W1):W356–W362. https://doi.org/10.1093/nar/gky417
doi: 10.1093/nar/gky417
pubmed: 29796670
pmcid: 6030891
Höhne M, Schätzle S, Jochens H et al (2010) Rational assignment of key motifs for function guides in silico enzyme identification. Nat Chem Biol 6(11):807–813. https://doi.org/10.1038/nchembio.447
doi: 10.1038/nchembio.447
pubmed: 20871599
Marshall JR, Yao P, Montgomery SL et al (2020) Screening and characterization of a diverse panel of metagenomic imine reductases for biocatalytic reductive amination. Nat Chem 13:1–9. https://doi.org/10.1038/s41557-020-00606-w
doi: 10.1038/s41557-020-00606-w
Davidi D, Shamshoum M, Guo Z et al (2020) Highly active rubiscos discovered by systematic interrogation of natural sequence diversity. EMBO J 39(18):e104081. https://doi.org/10.15252/embj.2019104081
doi: 10.15252/embj.2019104081
pubmed: 32500941
pmcid: 7507306
Alcalde M (2017) When directed evolution met ancestral enzyme resurrection. Microb Biotechnol 10(1):22–24. https://doi.org/10.1111/1751-7915.12452
doi: 10.1111/1751-7915.12452
pubmed: 27863072
Gumulya Y, Baek JM, Wun SJ et al (2018) Engineering highly functional thermostable proteins using ancestral sequence reconstruction. Nat Catal 1(11):878–888. https://doi.org/10.1038/s41929-018-0159-5
doi: 10.1038/s41929-018-0159-5
Gomez-Fernandez BJ, Risso VA, Rueda A et al (2020) Ancestral resurrection and directed evolution of fungal mesozoic laccases. Appl Environ Microbiol 86(14):e00778. https://doi.org/10.1128/AEM.00778-20
doi: 10.1128/AEM.00778-20
pubmed: 32414792
pmcid: 7357490
Kaltenbach M, Burke JR, Dindo M et al (2018) Evolution of chalcone isomerase from a noncatalytic ancestor. Nat Chem Biol 14(6):548–555. https://doi.org/10.1038/s41589-018-0042-3
doi: 10.1038/s41589-018-0042-3
pubmed: 29686356
Gamiz-Arco G, Gutierrez-Rus LI, Risso VA et al (2021) Heme-binding enables allosteric modulation in an ancient TIM-barrel glycosidase. Nat Commun 12(1):1–16. https://doi.org/10.1038/s41467-020-20630-1
doi: 10.1038/s41467-020-20630-1
Gardner JM, Biler M, Risso VA et al (2020) Manipulating conformational dynamics to repurpose ancient proteins for modern catalytic functions. ACS Catal 10(9):4863–4870. https://doi.org/10.1021/acscatal.0c00722
doi: 10.1021/acscatal.0c00722
Visootsat A, Nakamura A, Wang TW, Iino R (2020) Combined approach to engineer a highly active mutant of processive chitinase hydrolyzing crystalline chitin. ACS Omega 5(41):26807–26816. https://doi.org/10.1021/acsomega.0c03911
doi: 10.1021/acsomega.0c03911
pubmed: 33111007
pmcid: 7581260
Sun Z, Lonsdale R, Kong X-D et al (2015) Reshaping an enzyme binding pocket for enhanced and inverted stereoselectivity: use of smallest amino acid alphabets in directed evolution. Angew Chem 127(42):12587–12592. https://doi.org/10.1002/ange.201501809
doi: 10.1002/ange.201501809
Sun Z, Lonsdale R, Wu L et al (2016) Structure-guided triple-code saturation mutagenesis: efficient tuning of the stereoselectivity of an epoxide hydrolase. ACS Catal 6(3):1590–1597. https://doi.org/10.1021/acscatal.5b02751
doi: 10.1021/acscatal.5b02751
Sun Z, Lonsdale R, Ilie A et al (2016) Catalytic asymmetric reduction of difficult-to-reduce ketones: triple-code saturation mutagenesis of an alcohol dehydrogenase. ACS Catal 6(3):1598–1605. https://doi.org/10.1021/acscatal.5b02752
doi: 10.1021/acscatal.5b02752
Li D, Wu Q, Reetz MT (2020) Focused rational iterative site-specific mutagenesis (FRISM). Methods Enzymol 643:225–242. https://doi.org/10.1016/bs.mie.2020.04.055
doi: 10.1016/bs.mie.2020.04.055
pubmed: 32896283
Van Der Meer JY, Poddar H, Baas BJ et al (2016) Using mutability landscapes of a promiscuous tautomerase to guide the engineering of enantioselective Michaelases. Nat Commun 7(1):10911. https://doi.org/10.1038/ncomms10911
doi: 10.1038/ncomms10911
pubmed: 26952338
pmcid: 4786785
Guo C, Ni Y, Biewenga L et al (2021) Using mutability landscapes to guide enzyme thermostabilization. ChemBioChem 22(1):170–175. https://doi.org/10.1002/cbic.202000442
doi: 10.1002/cbic.202000442
pubmed: 32790123
Acevedo-Rocha CG, Gamble CG, Lonsdale R et al (2018) P450-catalyzed regio- and diastereoselective steroid hydroxylation: efficient directed evolution enabled by mutability landscaping. ACS Catal 8(4):3395–3410. https://doi.org/10.1021/acscatal.8b00389
doi: 10.1021/acscatal.8b00389
Li A, Acevedo-Rocha CG, D’Amore L et al (2020) Regio- and stereoselective steroid hydroxylation at C7 by cytochrome P450 monooxygenase mutants. Angew Chem Int Ed 59(30):12499–12505. https://doi.org/10.1002/anie.202003139
doi: 10.1002/anie.202003139
Nov Y, Fulton A, Jaeger KE (2013) Optimal scanning of all single-point mutants of a protein. J Comput Biol 20(12):990–997. https://doi.org/10.1089/cmb.2013.0026
doi: 10.1089/cmb.2013.0026
pubmed: 23859465
Fowler DM, Fields S (2014) Deep mutational scanning: a new style of protein science. Nat Methods 11(8):801–807. https://doi.org/10.1038/nmeth.3027
doi: 10.1038/nmeth.3027
pubmed: 25075907
pmcid: 4410700
Romero PA, Tran TM, Abate AR (2015) Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc Natl Acad Sci U S A 112(23):7159–7164. https://doi.org/10.1073/pnas.1422285112
doi: 10.1073/pnas.1422285112
pubmed: 26040002
pmcid: 4466731
Mehlhoff JD, Ostermeier M (2020) Biological fitness landscapes by deep mutational scanning. Methods Enzymol 643:203–224. https://doi.org/10.1016/bs.mie.2020.04.023
doi: 10.1016/bs.mie.2020.04.023
pubmed: 32896282
Song H, Bremer BJ, Hinds EC et al (2020) Inferring protein sequence-function relationships with large-scale positive-unlabeled learning. Cell Syst 12(1):92–101. https://doi.org/10.1016/j.cels.2020.10.007
doi: 10.1016/j.cels.2020.10.007
pubmed: 33212013
pmcid: 7856229
Tang Q, Grathwol CW, Aslan-Üzel AS et al (2021) Directed evolution of a halide methyltransferase enables biocatalytic synthesis of diverse SAM analogs. Angew Chem Int Ed 60(3):1524–1527. https://doi.org/10.1002/anie.202013871
doi: 10.1002/anie.202013871
Orozco M (2014) A theoretical view of protein dynamics. Chem Soc Rev 43(14):5051–5066. https://doi.org/10.1039/C3CS60474H
doi: 10.1039/C3CS60474H
pubmed: 24709805
Dodani SC, Kiss G, Cahn JKB et al (2016) Discovery of a regioselectivity switch in nitrating P450s guided by molecular dynamics simulations and Markov models. Nat Chem 8(5):419–425. https://doi.org/10.1038/nchem.2474
doi: 10.1038/nchem.2474
pubmed: 27102675
pmcid: 4843824
Osuna S, Jiménez-Osés G, Noey EL, Houk KN (2015) Molecular dynamics explorations of active site structure in designed and evolved enzymes. Acc Chem Res 48(4):1080–1089. https://doi.org/10.1021/ar500452q
doi: 10.1021/ar500452q
pubmed: 25738880
Childers MC, Daggett V (2017) Insights from molecular dynamics simulations for computational protein design. Mol Syst Des Eng 2(1):9–33. https://doi.org/10.1039/c6me00083e
doi: 10.1039/c6me00083e
pubmed: 28239489
pmcid: 5321087
Bunzel HA, Anderson JLLR, Mulholland AJ (2021) Designing better enzymes: insights from directed evolution. Curr Opin Struct Biol 67:212–218. https://doi.org/10.1016/j.sbi.2020.12.015
doi: 10.1016/j.sbi.2020.12.015
pubmed: 33517098
Sandström AG, Wikmark Y, Engström K et al (2012) Combinatorial reshaping of the Candida antarctica lipase A substrate pocket for enantioselectivity using an extremely condensed library. Proc Natl Acad Sci 109(1):78–83. https://doi.org/10.1073/pnas.1111537108
doi: 10.1073/pnas.1111537108
pubmed: 22178758
Tokuriki N, Jackson CJ, Afriat-Jurnou L et al (2012) Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nat Commun 3:1257. https://doi.org/10.1038/ncomms2246
doi: 10.1038/ncomms2246
pubmed: 23212386
Kaltenbach M, Tokuriki N (2014) Dynamics and constraints of enzyme evolution. J Exp Zool Part B Mol Dev Evol 322(7):468–487. https://doi.org/10.1002/jez.b.22562
doi: 10.1002/jez.b.22562
Goldsmith M, Aggarwal N, Ashani Y et al (2017) Overcoming an optimization plateau in the directed evolution of highly efficient nerve agent bioscavengers. Protein Eng Des Sel 30(4):333–345. https://doi.org/10.1093/protein/gzx003
doi: 10.1093/protein/gzx003
pubmed: 28159998
Götz AW, Williamson MJ, Xu D et al (2012) Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized born. J Chem Theory Comput 8(5):1542–1555. https://doi.org/10.1021/ct200909j
doi: 10.1021/ct200909j
pubmed: 22582031
pmcid: 3348677
Romero-Rivera A, Garcia-Borràs M, Osuna S (2017) Computational tools for the evaluation of laboratory-engineered biocatalysts. Chem Commun 53(2):284–297. https://doi.org/10.1039/C6CC06055B
doi: 10.1039/C6CC06055B
Yu H, Dalby PA (2020) A beginner’s guide to molecular dynamics simulations and the identification of cross-correlation networks for enzyme engineering. Methods Enzymol 643:15–49. https://doi.org/10.1016/bs.mie.2020.04.020
doi: 10.1016/bs.mie.2020.04.020
pubmed: 32896280
Marques SM, Planas-Iglesias J, Damborsky J (2020) Web-based tools for computational enzyme design. Preprints. https://doi.org/10.20944/preprints202012.0089.v1
Cilia E, Pancsa R, Tompa P et al (2014) The DynaMine webserver: predicting protein dynamics from sequence. Nucleic Acids Res 42(W1):W264. https://doi.org/10.1093/nar/gku270
doi: 10.1093/nar/gku270
pubmed: 24728994
pmcid: 4086073
Obexer R, Godina A, Garrabou X et al (2017) Emergence of a catalytic tetrad during evolution of a highly active artificial aldolase. Nat Chem 9(1):50–56. https://doi.org/10.1038/nchem.2596
doi: 10.1038/nchem.2596
pubmed: 27995916
Broom A, Rakotoharisoa RV, Thompson MC et al (2020) Ensemble-based enzyme design can recapitulate the effects of laboratory directed evolution in silico. Nat Commun 11(1):4808. https://doi.org/10.1038/s41467-020-18619-x
doi: 10.1038/s41467-020-18619-x
pubmed: 32968058
pmcid: 7511930
Li A, Wang B, Ilie A et al (2017) A redox-mediated Kemp eliminase. Nat Commun 8(1):1–8. https://doi.org/10.1038/ncomms14876
doi: 10.1038/ncomms14876
Hong NS, Petrović D, Lee R et al (2018) The evolution of multiple active site configurations in a designed enzyme. Nat Commun 9(1):3900. https://doi.org/10.1038/s41467-018-06305-y
doi: 10.1038/s41467-018-06305-y
pubmed: 30254369
pmcid: 6156567
Boehr DD, Nussinov R, Wright PE (2009) The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol 5(11):789–796. https://doi.org/10.1038/nchembio.232
doi: 10.1038/nchembio.232
pubmed: 19841628
pmcid: 2916928
Otten R, Pádua RAP, Bunze HA et al (2020) How directed evolution reshapes the energy landscape in an enzyme to boost catalysis. Science 370(6523):1442–1446. https://doi.org/10.1126/science.abd3623
doi: 10.1126/science.abd3623
pubmed: 33214289
pmcid: 9616100
Fasan R, Meharenna YT, Snow CD et al (2008) Evolutionary history of a specialized p450 propane monooxygenase. J Mol Biol 383(5):1069–1080. https://doi.org/10.1016/j.jmb.2008.06.060
doi: 10.1016/j.jmb.2008.06.060
pubmed: 18619466
pmcid: 2637765
Li G, Zhang H, Sun Z et al (2016) Multiparameter optimization in directed evolution: engineering thermostability, enantioselectivity, and activity of an epoxide hydrolase. ACS Catal 6(6):3679–3687. https://doi.org/10.1021/acscatal.6b01113
doi: 10.1021/acscatal.6b01113
Ostafe R, Fontaine N, Frank D et al (2020) One-shot optimization of multiple enzyme parameters: tailoring glucose oxidase for pH and electron mediators. Biotechnol Bioeng 117(1):17–29. https://doi.org/10.1002/bit.27169
doi: 10.1002/bit.27169
pubmed: 31520472
Schmidt-Dannert C, Arnold FH (1999) Directed evolution of industrial enzymes. Trends Biotechnol 17(4):135–136. https://doi.org/10.1016/S0167-7799(98)01283-9
doi: 10.1016/S0167-7799(98)01283-9
pubmed: 10203769
Starr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25(7):1204–1218. https://doi.org/10.1002/pro.2897
doi: 10.1002/pro.2897
pubmed: 26833806
pmcid: 4918427
Reetz MT (2013) The importance of additive and non-additive mutational effects in protein engineering. Angew Chem Int Ed 52:2658–2666
doi: 10.1002/anie.201207842
Acevedo-Rocha CG, Li A, D’Amore L et al (2021) Pervasive cooperative mutational effects on multiple catalytic enzyme traits emerge via long-range conformational dynamics. Nat Commun 12(1):1–13. https://doi.org/10.1038/s41467-021-21833-w
doi: 10.1038/s41467-021-21833-w
Miton CM, Tokuriki N (2016) How mutational epistasis impairs predictability in protein evolution and design. Protein Sci 25(7):1260–1272. https://doi.org/10.1002/pro.2876
doi: 10.1002/pro.2876
pubmed: 26757214
pmcid: 4918425
Bershtein S, Segal M, Bekerman R et al (2006) Robustness–epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444(7121):929–932
doi: 10.1038/nature05385
pubmed: 17122770
Weinreich DM, Delaney NF, DePristo MA, Hartl DL (2006) Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312(5770):111–114. https://doi.org/10.1126/science.1123539
doi: 10.1126/science.1123539
pubmed: 16601193
Poelwijk FJ, Kiviet DJ, Weinreich DM, Tans SJ (2007) Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445(7126):383–386. https://doi.org/10.1038/nature05451
doi: 10.1038/nature05451
pubmed: 17251971
Zhang Z-G, Lonsdale R, Sanchis J, Reetz MT (2014) Extreme synergistic mutational effects in the directed evolution of a Baeyer–Villiger monooxygenase as catalyst for asymmetric sulfoxidation. J Am Chem Soc 136(49):17262–17272. https://doi.org/10.1021/ja5098034
doi: 10.1021/ja5098034
pubmed: 25394568
Reetz MT, Sanchis J (2008) Constructing and analyzing the fitness landscape of an experimental evolutionary process. ChemBioChem 9(14):2260–2267. https://doi.org/10.1002/cbic.200800371
doi: 10.1002/cbic.200800371
pubmed: 18712749
Calzadiaz-Ramirez L, Calvó-Tusell C, Stoffel GMM et al (2020) In vivo selection for formate dehydrogenases with high efficiency and specificity toward NADP+. ACS Catal 10(14):7512–7525. https://doi.org/10.1021/acscatal.0c01487
doi: 10.1021/acscatal.0c01487
pubmed: 32733773
pmcid: 7384739
Maynard Smith J (1970) Natural selection and the concept of a protein space. Nature 225(5232):563–564. https://doi.org/10.1038/225563a0
doi: 10.1038/225563a0
Tracewell CA, Arnold FH (2009) Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Curr Opin Chem Biol 13(1):3–9. https://doi.org/10.1016/j.cbpa.2009.01.017
doi: 10.1016/j.cbpa.2009.01.017
pubmed: 19249235
pmcid: 2703427
Vornholt T, Christoffel F, Pellizzoni MM et al (2021) Systematic engineering of artificial metalloenzymes for new-to-nature reactions. Sci Adv 7(4):eabe4208. https://doi.org/10.1126/sciadv.abe4208
doi: 10.1126/sciadv.abe4208
pubmed: 33523952
Khersonsky O, Lipsh R, Avizemer Z et al (2018) Automated design of efficient and functionally diverse enzyme repertoires. Mol Cell 72(1):178–186.e5. https://doi.org/10.1016/j.molcel.2018.08.033
doi: 10.1016/j.molcel.2018.08.033
pubmed: 30270109
pmcid: 6193528
Miton CM, Chen JZ, Ost K et al (2020) Statistical analysis of mutational epistasis to reveal intramolecular interaction networks in proteins. Methods Enzymol 643:243–280. https://doi.org/10.1016/bs.mie.2020.07.012
doi: 10.1016/bs.mie.2020.07.012
pubmed: 32896284
Reetz MT, Soni P, Acevedo JP, Sanchis J (2009) Creation of an amino acid network of structurally coupled residues in the directed evolution of a thermostable enzyme. Angew Chem Int Ed 48(44):8268–8272. https://doi.org/10.1002/anie.200904209
doi: 10.1002/anie.200904209
Yu H, Dalby PA (2018) Coupled molecular dynamics mediate long- and short-range epistasis between mutations that affect stability and aggregation kinetics. Proc Natl Acad Sci 115(47):E11043–E11052. https://doi.org/10.1073/pnas.1810324115
doi: 10.1073/pnas.1810324115
pubmed: 30404916
pmcid: 6255212
Dean J (2020) The deep learning revolution and its implications for computer architecture and chip design. In: Fujino L (ed) IEEE International Solid-State Circuits Conference. Institute of Electrical and Electronics Engineers Inc., San Francisco, CA
Muggleton S, King RD, Stenberg MJE (1992) Protein secondary structure prediction using logic-based machine learning. Protein Eng Des Sel 5(7):647–657. https://doi.org/10.1093/protein/5.7.647
doi: 10.1093/protein/5.7.647
Li Y, Huang C, Ding L et al (2019) Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods 166:4–21
doi: 10.1016/j.ymeth.2019.04.008
pubmed: 31022451
Li H, Tian S, Li Y et al (2020) Modern deep learning in bioinformatics. J Mol Cell Biol 12(11):823–827. https://doi.org/10.1093/jmcb/mjaa030
doi: 10.1093/jmcb/mjaa030
pubmed: 32573721
pmcid: 7883817
Li G, Dong Y, Reetz MT (2019) Can machine learning revolutionize directed evolution of selective enzymes? Adv Synth Catal 361(11):2377–2386. https://doi.org/10.1002/adsc.201900149
doi: 10.1002/adsc.201900149
Wittmann BJ, Johnston KE, Wu Z, Arnold FH (2021) Advances in machine learning for directed evolution. Curr Opin Struct Biol 69:11–18. https://doi.org/10.1016/j.sbi.2021.01.008
doi: 10.1016/j.sbi.2021.01.008
pubmed: 33647531
Chowdhury R, Maranas CD (2020) From directed evolution to computational enzyme engineering—a review. AIChE J 66(3):e16847. https://doi.org/10.1002/aic.16847
doi: 10.1002/aic.16847
Siedhoff NE, Schwaneberg U, Davari MD (2020) Machine learning-assisted enzyme engineering. Methods Enzymol 643:281–315. https://doi.org/10.1016/bs.mie.2020.05.005
doi: 10.1016/bs.mie.2020.05.005
pubmed: 32896285
Service R (2020) ‘The game has changed.’ AI triumphs at solving protein structures. Science 370:1144. https://doi.org/10.1126/science.abf9367
doi: 10.1126/science.abf9367
pubmed: 33273077
Callaway E (2020) “It will change everything”: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 588:203–204
doi: 10.1038/d41586-020-03348-4
pubmed: 33257889
Jones MT (2018) Data, structure, and the data science pipeline. https://developer.ibm.com/articles/ba-intro-data-science-1/ . Accessed 24 Apr 2021
Lawrence N (2017) Data readiness levels. arXiv:170502245
Pestov V (2013) Is the k-NN classifier in high dimensions affected by the curse of dimensionality? Comput Math Appl 65(10):1427–1437. https://doi.org/10.1016/j.camwa.2012.09.011
doi: 10.1016/j.camwa.2012.09.011
Ma F, Chung MT, Yao Y et al (2018) Efficient molecular evolution to generate enantioselective enzymes using a dual-channel microfluidic droplet screening platform. Nat Commun 9(1):1–8. https://doi.org/10.1038/s41467-018-03492-6
doi: 10.1038/s41467-018-03492-6
Wittmann BJ, Yue Y, Arnold FH (2020) Machine learning-assisted directed evolution navigates a combinatorial epistatic fitness landscape with minimal screening burden. bioRxiv. https://doi.org/10.1101/2020.12.04.408955
Jun Z, Bin L (2019) A review on the recent developments of sequence-based protein feature extraction methods. Curr Bioinforma 14(3):190–199. https://doi.org/10.2174/1574893614666181212102749
doi: 10.2174/1574893614666181212102749
Rawi R, Mall R, Kunji K et al (2018) PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. Bioinformatics 34(7):1092–1098. https://doi.org/10.1093/bioinformatics/btx662
doi: 10.1093/bioinformatics/btx662
pubmed: 29069295
Ding X, Zou Z, Brooks CL (2019) Deciphering protein evolution and fitness landscapes with latent space models. Nat Commun 10(1):1–13. https://doi.org/10.1038/s41467-019-13633-0
doi: 10.1038/s41467-019-13633-0
Linder J, Bogard N, Rosenberg AB, Seelig G (2020) A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences. Cell Syst 11(1):49–62.e16. https://doi.org/10.1016/j.cels.2020.05.007
doi: 10.1016/j.cels.2020.05.007
pubmed: 32711843
pmcid: 8694568
Lu AX, Zhang H, Ghassemi M, Moses A (2020) Self-supervised contrastive learning of protein representations by mutual information maximization. bioRxiv. https://doi.org/10.1101/2020.09.04.283929
Rives A, Goyal S, Meier J et al (2019) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv:622803. https://doi.org/10.1101/622803
Madani A, Mccann B, Naik N et al (2020) ProGen: language modeling for protein generation. arXiv:200403497
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge, MA
Angermueller C, Research G, Dohan D et al (n.d.) Model-based reinforcement learning for biological sequence design. Under review
Markova K, Chmelova K, Marques SM et al (2020) Decoding the intricate network of molecular interactions of a hyperstable engineered biocatalyst. Chem Sci 11(41):11162–11178. https://doi.org/10.1039/d0sc03367g
doi: 10.1039/d0sc03367g
pubmed: 34094357
pmcid: 8162949
Hie B, Bryson BD, Berger B (2020) Leveraging uncertainty in machine learning accelerates biological discovery and design. Cell Syst 11(5):461–477.e9. https://doi.org/10.1016/j.cels.2020.09.007
doi: 10.1016/j.cels.2020.09.007
pubmed: 33065027
Von Luxburg U, Schölkopf B (2011) Statistical learning theory: models, concepts, and results. In: Gabbay DM, Hartmann S, Woods J (eds) Handbook of the history of logic. North-Holland, Amsterdam
Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput Sci 44:1–12
doi: 10.1021/ci0342472
pubmed: 14741005
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge, MA
Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999
Shin J-E, Riesselman AJ, Kollasch AW et al (2021) Protein design and variant prediction using autoregressive generative models. Nat Commun 12(1):2403. https://doi.org/10.1038/s41467-021-22732-w
Luo Y, Jiang G, Yu T et al (2021) ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun 12(1):5743. https://doi.org/10.1038/s41467-021-25976-8