Machine learning-aided design and screening of an emergent protein function in synthetic cells.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
05 Mar 2024
05 Mar 2024
Historique:
received:
27
06
2023
accepted:
16
02
2024
medline:
6
3
2024
pubmed:
6
3
2024
entrez:
5
3
2024
Statut:
epublish
Résumé
Recently, utilization of Machine Learning (ML) has led to astonishing progress in computational protein design, bringing into reach the targeted engineering of proteins for industrial and biomedical applications. However, the design of proteins for emergent functions of core relevance to cells, such as the ability to spatiotemporally self-organize and thereby structure the cellular space, is still extremely challenging. While on the generative side conditional generative models and multi-state design are on the rise, for emergent functions there is a lack of tailored screening methods as typically needed in a protein design project, both computational and experimental. Here we describe a proof-of-principle of how such screening, in silico and in vitro, can be achieved for ML-generated variants of a protein that forms intracellular spatiotemporal patterns. For computational screening we use a structure-based divide-and-conquer approach to find the most promising candidates, while for the subsequent in vitro screening we use synthetic cell-mimics as established by Bottom-Up Synthetic Biology. We then show that the best screened candidate can indeed completely substitute the wildtype gene in Escherichia coli. These results raise great hopes for the next level of synthetic biology, where ML-designed synthetic proteins will be used to engineer cellular functions.
Identifiants
pubmed: 38443351
doi: 10.1038/s41467-024-46203-0
pii: 10.1038/s41467-024-46203-0
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
2010Informations de copyright
© 2024. The Author(s).
Références
Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Ferruz, N. & Höcker, B. Controllable protein design with language models. Nat. Mach. Intell. 4, 521–532 (2022).
Ferruz, N. et al. From sequence to function through structure: Deep learning for protein design. Comput. Struct. Biotechnol. J. 21, 238–250 (2023).
pubmed: 36544476
doi: 10.1016/j.csbj.2022.11.014
Bordin, N. et al. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem. Sci. 48, 345–359 (2023).
pubmed: 36504138
pmcid: 10570143
doi: 10.1016/j.tibs.2022.11.001
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
pubmed: 37433327
pmcid: 10468394
doi: 10.1038/s41586-023-06415-8
Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).
pubmed: 35862514
pmcid: 9621694
doi: 10.1126/science.abn2100
Lu, H. et al. Machine learning-aided engineering of hydrolases for PET depolymerization. Nature 604, 662–667 (2022).
pubmed: 35478237
doi: 10.1038/s41586-022-04599-z
Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).
pubmed: 36702895
pmcid: 10400306
doi: 10.1038/s41587-022-01618-2
Rudden, L. S. P., Hijazi, M. & Barth, P. Deep learning approaches for conformational flexibility and switching properties in protein design. Front Mol. Biosci. 9, 928534 (2022).
pubmed: 36032687
pmcid: 9399439
doi: 10.3389/fmolb.2022.928534
Gligorijević, V. et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12, 3168 (2021).
pubmed: 34039967
pmcid: 8155034
doi: 10.1038/s41467-021-23303-9
Makrodimitris, S., Van Ham, R. C. H. J. & Reinders, M. J. T. Automatic gene function prediction in the 2020’s. Genes (Basel) 11, 1264 (2020).
pubmed: 33120976
doi: 10.3390/genes11111264
Littmann, M., Heinzinger, M., Dallago, C., Olenyi, T. & Rost, B. Embeddings from deep learning transfer GO annotations beyond homology. Sci. Rep. 11, 1–14 (2021). 1160.
doi: 10.1038/s41598-020-80786-0
Kucera, T., Togninalli, M. & Meng-Papaxanthos, L. Conditional generative modeling for de novo protein design with hierarchical functions. Bioinformatics 38, 3454–3461 (2022).
pubmed: 35639661
pmcid: 9237736
doi: 10.1093/bioinformatics/btac353
Munsamy, G., Lindner, S., Lorenz, P. & Ferruz, N. ZymCTRL: a conditional language model for the controllable generation of artificial enzymes. MLSB (2022)
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. bioRxiv https://doi.org/10.1101/2023.10.09.561603 (2023)
Kuhlman, B. & Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 20, 681–697 (2019).
pubmed: 31417196
pmcid: 7032036
doi: 10.1038/s41580-019-0163-x
Gane, A. et al. ProtNLM: Model-based Natural Language Protein Annotation. Google PrePrint https://storage.googleapis.com/brain-genomics-public/research/proteins/protnlm/uniprot_2022_04/protnlm_preprint_draft.pdf (2022).
Schwille, P. & Frohn, B. P. Hidden protein functions and what they may teach us Synthesizing from the bottom-up. https://doi.org/10.1016/j.tcb.2021.09.006 (2022)
Kohyama, S., Yoshinaga, N., Yanagisawa, M., Fujiwara, K. & Doi, N. Cell-sized confinement controls generation and stability of a protein wave for spatiotemporal regulation in cells. Elife 8 (2019).
Litschel, T., Ramm, B., Maas, R., Heymann, M. & Schwille, P. Beating vesicles: encapsulated protein oscillations cause dynamic membrane deformations. Angew. Chem. Int Ed. Engl. 57, 16286–16290 (2018).
pubmed: 30270475
pmcid: 6391971
doi: 10.1002/anie.201808750
Loose, M., Fischer-Friedrich, E., Ries, J., Kruse, K. & Schwille, P. Spatial regulators for bacterial cell division self-organize into surface waves in vitro. Science 320, 789–792 (2008).
pubmed: 18467587
doi: 10.1126/science.1154413
Glock, P., Brauns, F., Halatek, J., Frey, E. & Schwille, P. Design of biochemical pattern forming systems from minimal motifs. Elife 8 (2019).
Glock, P. et al. Stationary patterns in a two-protein reaction-diffusion system. ACS Synth. Biol. 8, 148–157 (2019).
pubmed: 30571913
doi: 10.1021/acssynbio.8b00415
Ramm, B., Heermann, T. & Schwille, P. The E. coli MinCDE system in the regulation of protein patterns and gradients. Cell. Mol. Life Sci. 76, 4245–4273 (2019).
pubmed: 31317204
pmcid: 6803595
doi: 10.1007/s00018-019-03218-x
Hawkins-Hooker, A. et al. Generating functional protein variants with variational autoencoders. PLoS Comput. Biol. 17, e1008736 (2021).
pubmed: 33635868
pmcid: 7946179
doi: 10.1371/journal.pcbi.1008736
Lee, K. et al. Cell-free biosynthesis of peptidomimetics. Biotechnol. Bioprocess Eng. 28, 905–921 (2023).
doi: 10.1007/s12257-022-0268-5
Repecka, D. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat. Mach. Intell. 3, 324–333 (2021).
doi: 10.1038/s42256-021-00310-5
Russ, W. P., Lowery, D. M., Mishra, P., Yaffe, M. B. & Ranganathan, R. Natural-like function in artificial WW domains. Nature 437, 579–583 (2005).
pubmed: 16177795
doi: 10.1038/nature03990
Socolich, M. et al. Evolutionary information for specifying a protein fold. Nature 437, 512–518 (2005).
pubmed: 16177782
doi: 10.1038/nature03991
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022)
Hebditch, M. & Warwicker, J. Web-based display of protein surface and pH-dependent properties for assessing the developability of biotherapeutics. Sci. Rep. 9, 1969 (2019).
pubmed: 30760735
pmcid: 6374528
doi: 10.1038/s41598-018-36950-8
Szeto, T. H., Rowland, S. L., Habrukowich, C. L. & King, G. F. The MinD membrane targeting sequence is a transplantable lipid-binding helix. J. Biol. Chem. 278, 40050–40056 (2003).
pubmed: 12882967
doi: 10.1074/jbc.M306876200
Shih, Y. L. et al. The N-terminal amphipathic helix of the topological specificity factor MinE is associated with shaping membrane curvature. PLoS ONE 6, e21425 (2011).
pubmed: 21738659
pmcid: 3124506
doi: 10.1371/journal.pone.0021425
Hurley, J. Membrane binding domains. Biochim. Biophys. Acta 1761, 805–811 (2006).
pubmed: 16616874
pmcid: 2049088
doi: 10.1016/j.bbalip.2006.02.020
Hebditch, M., Carballo-Amador, M. A., Charonis, S., Curtis, R. & Warwicker, J. Protein–Sol: a web tool for predicting protein solubility from sequence. Bioinformatics 33, 3098–3100 (2017).
pubmed: 28575391
pmcid: 5870856
doi: 10.1093/bioinformatics/btx345
Silverman, A. D., Karim, A. S. & Jewett, M. C. Cell-free gene expression: an expanded repertoire of applications. Nat. Rev. Genet. 21, 151–170 (2020).
pubmed: 31780816
doi: 10.1038/s41576-019-0186-3
Garenne, D. et al. Cell-free gene expression. Nat. Rev. Methods Prim. 1, 49 (2021).
doi: 10.1038/s43586-021-00046-x
Shimizu, Y. et al. Cell-free translation reconstituted with purified components. Nat. Biotechnol. 19, 751–755 (2001).
pubmed: 11479568
doi: 10.1038/90802
Yoshida, A., Kohyama, S., Fujiwara, K., Nishikawa, S. & Doi, N. Regulation of spatiotemporal patterning in artificial cells by a defined protein expression system. Chem. Sci. 10, 11064–11072 (2019).
pubmed: 32190256
pmcid: 7066863
doi: 10.1039/C9SC02441G
Kohyama, S., Merino-Salomón, A. & Schwille, P. In vitro assembly, positioning and contraction of a division ring in minimal cells. Nat. Commun. 13, 6098 (2022).
pubmed: 36243816
pmcid: 9569390
doi: 10.1038/s41467-022-33679-x
Godino, E., Doerr, A. & Danelon, C. Min waves without MinC can pattern FtsA-anchored FtsZ filaments on model membranes. Commun. Biol. 5, 675 (2022).
pubmed: 35798943
pmcid: 9262947
doi: 10.1038/s42003-022-03640-1
Godino, E. et al. De novo synthesized Min proteins drive oscillatory liposome deformation and regulate FtsA-FtsZ cytoskeletal patterns. Nat. Commun. 10, 4969 (2019).
pubmed: 31672986
pmcid: 6823393
doi: 10.1038/s41467-019-12932-w
Hale, C. A. Dynamic localization cycle of the cell division regulator MinE in Escherichia coli. EMBO J. 20, 1563–1572 (2001).
pubmed: 11285221
pmcid: 145461
doi: 10.1093/emboj/20.7.1563
de Boer, P. A. J., Crossley, R. E. & Rothfield, L. I. A division inhibitor and a topological specificity factor coded for by the minicell locus determine proper placement of the division septum in E. coli. Cell 56, 641–649 (1989).
pubmed: 2645057
doi: 10.1016/0092-8674(89)90586-2
Hu, Z. & Lutkenhaus, J. Topological regulation of cell division in E. coli. spatiotemporal oscillation of MinD requires stimulation of its ATPase by MinE and phospholipid. Mol. Cell 7, 1337–1343 (2001).
pubmed: 11430835
doi: 10.1016/S1097-2765(01)00273-8
Ma, L. Y., King, G. & Rothfield, L. Mapping the MinE site involved in interaction with the MinD division site selection protein of Escherichia coli. J. Bacteriol. 185, 4948–4955 (2003).
pubmed: 12897015
pmcid: 166455
doi: 10.1128/JB.185.16.4948-4955.2003
Lackner, L. L., Raskin, D. M. & De Boer, P. A. J. ATP-dependent interactions between Escherichia coli Min proteins and the phospholipid membrane in vitro. J. Bacteriol. 185, 735–749 (2003).
pubmed: 12533449
pmcid: 142821
doi: 10.1128/JB.185.3.735-749.2003
Hu, Z., Saez, C. & Lutkenhaus, J. Recruitment of MinC, an Inhibitor of Z-Ring Formation, to the Membrane in Escherichia coli: Role of MinD and MinE. J. Bacteriol. 185, 196–203 (2003).
pubmed: 12486056
pmcid: 141945
doi: 10.1128/JB.185.1.196-203.2003
Hu, Z. & Lutkenhaus, J. Topological regulation of cell division in E. coli: spatiotemporal oscillation of mind requires stimulation of its ATPase by MinE and phospholipid. Mol. Cell 7, 1337–1343 (2001).
pubmed: 11430835
doi: 10.1016/S1097-2765(01)00273-8
Park, K. T. et al. The Min oscillator uses MinD-dependent conformational changes in MinE to spatially regulate cytokinesis. Cell 146, 396–407 (2011).
pubmed: 21816275
pmcid: 3155264
doi: 10.1016/j.cell.2011.06.042
Kohyama, S., Fujiwara, K., Yoshinaga, N. & Doi, N. Conformational equilibrium of MinE regulates the allowable concentration ranges of a protein wave for cell division. Nanoscale 12, 11960–11970 (2020).
pubmed: 32458918
doi: 10.1039/D0NR00242A
Park, K. T., Villar, M. T., Artigues, A. & Lutkenhaus, J. MinE conformational dynamics regulate membrane binding, MinD interaction, and Min oscillation. Proc. Natl Acad. Sci. USA 114, 7497–7504 (2017).
pubmed: 28652337
pmcid: 5530704
doi: 10.1073/pnas.1707385114
Linke, H., Höcker, B., Furuta, K., Forde, N. R. & Curmi, P. M. G. Synthetic biology approaches to dissecting linear motor protein function: towards the design and synthesis of artificial autonomous protein walkers. Biophys. Rev. 12, 1041–1054 (2020).
pubmed: 32651904
pmcid: 7429643
doi: 10.1007/s12551-020-00717-1
Halatek, J., Brauns, F. & Frey, E. Self-organization principles of intracellular pattern formation. Philos. Trans. R. Soc. B: Biol. Sci. 373, 20170107 (2018).
doi: 10.1098/rstb.2017.0107
Richoux, F., Servantie, C., Borès, C. & Téletchéa, S. Comparing two deep learning sequence-based models for protein-protein interaction prediction. arXiv https://doi.org/10.48550/arXiv.1901.06268 (2019).
Ramirez‐Arcos, S. et al. Conservation of dynamic localization among MinD and MinE orthologues: oscillation of Neisseria gonorrhoeae proteins in Escherichia coli. Mol. Microbiol 46, 493–504 (2002).
pubmed: 12406224
doi: 10.1046/j.1365-2958.2002.03168.x
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Res. 51, D418–D427 (2023).
pubmed: 36350672
doi: 10.1093/nar/gkac993
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
pubmed: 16731699
doi: 10.1093/bioinformatics/btl158
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7 (2011).
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
pubmed: 33125078
doi: 10.1093/nar/gkaa913
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019).
Openai, I. G. NIPS 2016 Tutorial: Generative Adversarial Networks. arXiv https://doi.org/10.48550/arXiv.1701.00160 (2016).
Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).
pubmed: 31821414
doi: 10.1093/bioinformatics/btz921
Madeira, F. et al. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res. 50, W276–W279 (2022).
pubmed: 35412617
pmcid: 9252731
doi: 10.1093/nar/gkac240
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
pubmed: 22743772
doi: 10.1038/nmeth.2019
Campbell, B. C. et al. mGreenLantern: a bright monomeric fluorescent protein with rapid expression and cell filling properties for neuronal imaging. Proc. Natl Acad. Sci. USA 117, 30710–30721 (2020).
pubmed: 33208539
pmcid: 7720163
doi: 10.1073/pnas.2000942117
Ramm, B., Glock, P. & Schwille, P. In vitro reconstitution of self-organizing protein patterns on supported lipid bilayers. J. Vis. Exp. 2018 (2018).
Kohyama, S., Fujiwara, K., Yoshinaga, N. Self-organization assay for min proteins of Escherichia coli in micro-droplets covered with lipids. Bio Protoc. 10 (2020).