Protein sequence design by conformational landscape optimization.
energy landscape
machine learning
protein design
sequence optimization
stability prediction
Journal
Proceedings of the National Academy of Sciences of the United States of America
ISSN: 1091-6490
Titre abrégé: Proc Natl Acad Sci U S A
Pays: United States
ID NLM: 7505876
Informations de publication
Date de publication:
16 03 2021
16 03 2021
Historique:
entrez:
13
3
2021
pubmed:
14
3
2021
medline:
24
9
2021
Statut:
ppublish
Résumé
The protein design problem is to identify an amino acid sequence that folds to a desired structure. Given Anfinsen's thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the desired structure is the lowest energy state. As this calculation involves not only all possible amino acid sequences but also, all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest-energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest-energy conformation for the designed sequence, and typically discarding a large fraction of designed sequences for which this is not the case. Here, we show that by backpropagating gradients through the transform-restrained Rosetta (trRosetta) structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures in a single calculation. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single-point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by conformational landscape optimization with the standard energy-based sequence design methodology in Rosetta and show that the former can result in energy landscapes with fewer alternative energy minima. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low-resolution trRosetta model serves to disfavor alternative states, and the high-resolution Rosetta model serves to create a deep energy minimum at the design target structure.
Identifiants
pubmed: 33712545
pii: 2017228118
doi: 10.1073/pnas.2017228118
pmc: PMC7980421
pii:
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NIH HHS
ID : DP5 OD026389
Pays : United States
Organisme : Howard Hughes Medical Institute
Pays : United States
Organisme : NIAID NIH HHS
ID : HHSN272201700059C
Pays : United States
Investigateurs
Alan Coral
(A)
Alex J Bubar
(AJ)
Alexander Boykov
(A)
Alexander Uriel Valle Pérez
(AU)
Alison MacMillan
(A)
Allen Lubow
(A)
Andrea Mussini
(A)
Andrew Cai
(A)
Andrew John Ardill
(AJ)
Aniruddha Seal
(A)
Artak Kalantarian
(A)
Barbara Failer
(B)
Belinda Lackersteen
(B)
Benjamin Chagot
(B)
Beverly R Haight
(BR)
Bora Taştan
(B)
Boris Uitham
(B)
Brandon G Roy
(BG)
Breno Renan de Melo Cruz
(BR)
Brian Echols
(B)
Brian Edward Lorenz
(BE)
Bruce Blair
(B)
Bruno Kestemont
(B)
C D Eastlake
(CD)
Callen Joseph Bragdon
(CJ)
Carl Vardeman
(C)
Carlo Salerno
(C)
Casey Comisky
(C)
Catherine Louise Hayman
(CL)
Catherine R Landers
(CR)
Cathy Zimov
(C)
Charles David Coleman
(CD)
Charles Robert Painter
(CR)
Christopher Ince
(C)
Conor Lynagh
(C)
Dmitrii Malaniia
(D)
Douglas Craig Wheeler
(DC)
Douglas Robertson
(D)
Vera Simon
(V)
Emanuele Chisari
(E)
Eric Lim Jit Kai
(ELJ)
Farah Rezae
(F)
Ferenc Lengyel
(F)
Flavian Tabotta
(F)
Franco Padelletti
(F)
Frisno Boström
(F)
Gary O Gross
(GO)
George McIlvaine
(G)
Gil Beecher
(G)
Gregory T Hansen
(GT)
Guido de Jong
(G)
Harald Feldmann
(H)
Jami Lynne Borman
(JL)
Jamie Quinn
(J)
Jane Norrgard
(J)
Jason Truong
(J)
Jasper A Diderich
(JA)
Jeffrey Michael Canfield
(JM)
Jeffrey Photakis
(J)
Jesse David Slone
(JD)
Joanna Madzio
(J)
Joanne Mitchell
(J)
John Charles Stomieroski
(JC)
John H Mitch
(JH)
Johnathan Robert Altenbeck
(JR)
Jonas Schinkler
(J)
Jonathan Barak Weinberg
(JB)
Joshua David Burbach
(JD)
João Carlos Sequeira da Costa
(JC)
Juan Francisco Bada Juarez
(JF)
Jón Pétur Gunnarsson
(JP)
Kathleen Diane Harper
(KD)
Keehyoung Joo
(K)
Keith T Clayton
(KT)
Kenneth E DeFord
(KE)
Kevin F Scully
(KF)
Kevin M Gildea
(KM)
Kirk J Abbey
(KJ)
Kristen Lee Kohli
(KL)
Kyle Stenner
(K)
Kálmán Takács
(K)
LaVerne L Poussaint
(LL)
Larry C Manalo
(LC)
Larry C Withers
(LC)
Lilium Carlson
(L)
Linda Wei
(L)
Luke Ryan Fisher
(LR)
Lynn Carpenter
(L)
Ma Ji-Hwan
(M)
Manuel Ricci
(M)
Marcus Anthony Belcastro
(MA)
Marek Leniec
(M)
Marie Hohmann
(M)
Mark Thompson
(M)
Matthew A Thayer
(MA)
Matthias Gaebel
(M)
Michael D Cassidy
(MD)
Michael Fagiola
(M)
Michael Lewis
(M)
Michael Pfützenreuter
(M)
Michael Simon
(M)
Moamen M Elmassry
(MM)
Noah Benevides
(N)
Norah Kathleen Kerr
(NK)
Nupur Verma
(N)
Oak Shannon
(O)
Owen Yin
(O)
Pascal Wolfteich
(P)
Paul Gummersall
(P)
Paweł Tłuścik
(P)
Peter Gajar
(P)
Peter John Triggiani
(PJ)
Rajarshi Guha
(R)
Renton Braden Mathew Innes
(RB)
Ricky Buchanan
(R)
Robert Gamble
(R)
Robert Leduc
(R)
Robert Spearing
(R)
Rodrigo Luccas Corrêa Dos Santos Gomes
(RLC)
Roger D Estep
(RD)
Ryan DeWitt
(R)
Ryan Moore
(R)
Scott G Shnider
(SG)
Scott J Zaccanelli
(SJ)
Sergey Kuznetsov
(S)
Sergio Burillo-Sanz
(S)
Seán Mooney
(S)
Sidoruk Vasiliy
(S)
Slava S Butkovich
(SS)
Spencer Bruce Hudson
(SB)
Spencer Len Pote
(SL)
Stephen Phillip Denne
(SP)
Steven A Schwegmann
(SA)
Sumanth Ratna
(S)
Susan C Kleinfelter
(SC)
Thomas Bausewein
(T)
Thomas J George
(TJ)
Tobias Scherf de Almeida
(TS)
Ulas Yeginer
(U)
Walter Barmettler
(W)
Warwick Robert Pulley
(WR)
William Scott Wright
(WS)
None Willyanto
Wyatt Lansford
(W)
Xavier Hochart
(X)
Yoan Anthony Skander Gaiji
(YAS)
Yuriy Lagodich
(Y)
Vivier Christian
(V)
Informations de copyright
Copyright © 2021 the Author(s). Published by PNAS.
Déclaration de conflit d'intérêts
The authors declare no competing interest.
Références
Jones D. T.. De novo protein design using pairwise potentials and a genetic algorithm. Protein Sci.. 1994;3:567–574.
Kuhlman B., et al. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003;302:1364–1368.
Dahiyat B. I., Mayo S. L.. De novo protein design: Fully automated sequence selection. Science. 1997;278:82–87.
Ingraham J., Garg V., Barzilay R., Jaakkola T.. Generative models for graph-based protein design. NeurIPS Proc.. 2019;32:15820–15831.
Greener J. G., Moffat L., Jones D. T.. Design of metalloproteins and novel protein folds using variational autoencoders. Sci. Rep.. 2018;8:16189.
Anand N., Eguchi R. R., Derry A., Altman R. B., Huang P.-S.. Protein sequence design with a learned potential. 2020.
Basak S., et al. Networks of electrostatic and hydrophobic interactions modulate the complex folding free energy surface of a designed βα protein. Proc. Natl. Acad. Sci. U.S.A.. 2019;116:6806–6811.
Watters A. L., et al. The highly cooperative folding of small naturally occurring proteins is likely the result of natural selection. Cell. 2007;128:613–624.
Koga N., et al. Principles for designing ideal protein structures. Nature. 2012;491:222–227.
Havranek J. J., Harbury P. B.. Automated design of specificity in molecular recognition. Nat. Struct. Biol.. 2003;10:45–52.
Leaver-Fay A., Jacak R., Stranges P. B., Kuhlman B.. A generic program for multistate protein design. PLoS One. 2011;6:e20937.
Leaver-Fay A., et al. Computationally designed bispecific antibodies using negative state repertoires. Structure. 2016;24:641–651.
Yang J., et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. U.S.A.. 2020;117:1496–1503.
Zheng F., Zhang J., Grigoryan G.. Tertiary structural propensities reveal fundamental sequence/structure relationships. Structure. 2015;23:961–971.
Anishchenko I., Chidyausiku T. M., Ovchinnikov S., Pellock S. J., Baker D.. De novo protein design by deep network hallucination. 2020.
Koepnick B., et al. De novo protein design by citizen scientists. Nature. 2019;570:390–394.
Simons K. T., Kooperberg C., Huang E., Baker D.. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol.. 1997;268:209–225.
Shaw D. E., et al. Atomic-level characterization of the structural dynamics of proteins. Science. 2010;330:341–346.
Lindorff-Larsen K., Piana S., Dror R. O., Shaw D. E.. How fast-folding proteins fold. Science. 2011;334:517–520.
Rocklin G. J., et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357:168–175.
Fleishman S. J., et al. RosettaScripts: A scripting language interface to the Rosetta macromolecular modeling suite. PLoS One. 2011;6:e20161.
Brunette T. J., et al. Modular repeat protein sculpting using rigid helical junctions. Proc. Natl. Acad. Sci. U.S.A.. 2020;117:8870–8875.
Basanta B., et al. An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proc. Natl. Acad. Sci. U.S.A.. 2020;117:22135–22145.
Jang E., Gu S., Poole B.. Categorical reparameterization with Gumbel-Softmax. 2017.
Cortés J.. Finite-time convergent gradient flows with applications to network consensus. Automatica. 2006;42:1993–2000.
Linder J., Seelig G.. Fast differentiable DNA and protein sequence optimization for molecular design. 2020.
Rohl C. A., Strauss C. E. M., Misura K. M. S., Baker D.. Protein structure prediction using Rosetta. Methods Enzymol.. 2004;383:66–93.
Bhardwaj G., et al. Accurate de novo design of hyperstable constrained peptides. Nature. 2016;538:329–335.
Huang P.-S., et al. RosettaRemodel: A generalized framework for flexible backbone protein design. PLoS One. 2011;6:e24109.
Dou J., et al. De novo design of a fluorescence-activating β-barrel. Nature. 2018;561:485–491.
Maguire J. B., et al. Perturbing the energy landscape for improved packing during computational protein design. Proteins. 2020.
doi: 10.1002/prot.26030
Alford R. F., et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput.. 2017;13:3031–3048.
Tan K. P., Varadarajan R., Madhusudhan M. S.. DEPTH: A web server to compute depth and predict small-molecule binding cavities in proteins. Nucleic Acids Res.. 2011;39:W242–W248.