Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2021
2021
Historique:
received:
23
12
2020
accepted:
12
08
2021
entrez:
26
8
2021
pubmed:
27
8
2021
medline:
15
12
2021
Statut:
epublish
Résumé
Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.
Identifiants
pubmed: 34437621
doi: 10.1371/journal.pone.0256691
pii: PONE-D-20-40388
pmc: PMC8389498
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0256691Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Cell. 2014 Jun 19;157(7):1644-1656
pubmed: 24949974
Nat Biotechnol. 2007 Oct;25(10):1171-6
pubmed: 17891135
PLoS Comput Biol. 2012;8(8):e1002639
pubmed: 22927804
Biochemistry. 2010 Apr 13;49(14):2987-98
pubmed: 20235548
Bioinformatics. 2018 Sep 1;34(17):i811-i820
pubmed: 30423073
Proc Natl Acad Sci U S A. 2019 Jan 29;116(5):1597-1602
pubmed: 30642961
Sci Rep. 2020 Mar 9;10(1):4371
pubmed: 32152349
Nat Chem Biol. 2019 Nov;15(11):1120-1128
pubmed: 31636435
Nature. 2016 Sep 14;537(7620):320-7
pubmed: 27629638
J Mol Evol. 1980 Jul;15(3):197-218
pubmed: 7401178
PLoS One. 2011;6(7):e20937
pubmed: 21754981
Bioinformatics. 2019 Sep 15;35(18):3320-3328
pubmed: 30759180
Curr Opin Struct Biol. 2016 Aug;39:16-26
pubmed: 27086078
J Mol Biol. 2005 Mar 18;347(1):203-27
pubmed: 15733929
Protein Sci. 2016 Jul;25(7):1204-18
pubmed: 26833806
Nat Struct Biol. 2003 Jan;10(1):45-52
pubmed: 12459719
J Mol Biol. 2007 Sep 7;372(1):1-6
pubmed: 17628593
Methods Enzymol. 2011;487:545-74
pubmed: 21187238
J Chem Theory Comput. 2015 Feb 10;11(2):609-22
pubmed: 25866491
Proteins. 2018 Apr;86(4):383-392
pubmed: 29318667
Curr Opin Genet Dev. 2013 Dec;23(6):700-7
pubmed: 24290990
Proc Natl Acad Sci U S A. 2004 Aug 10;101(32):11566-70
pubmed: 15292507
Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503
pubmed: 31896580
Nat Rev Mol Cell Biol. 2019 Nov;20(11):681-697
pubmed: 31417196
PLoS Comput Biol. 2007 Aug;3(8):e164
pubmed: 17722975
J Mol Biol. 2002 Jul 5;320(2):369-87
pubmed: 12079393
Proteins. 2004 Dec 1;57(4):678-83
pubmed: 15390263
Methods Enzymol. 2013;523:171-90
pubmed: 23422430
Structure. 2014 Feb 4;22(2):218-29
pubmed: 24361272
Bioinformatics. 2020 Feb 15;36(4):1135-1142
pubmed: 31588495
Protein Sci. 2016 Jul;25(7):1260-72
pubmed: 26757214
Comput Struct Biotechnol J. 2019 Dec 26;18:162-176
pubmed: 31969975
Proteins. 2014 May;82(5):858-66
pubmed: 24265170
Biophys Chem. 2011 Nov;159(1):129-41
pubmed: 21684672
Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13274-9
pubmed: 14597710
Science. 1998 Nov 20;282(5393):1462-7
pubmed: 9822371
Nature. 2003 May 8;423(6936):185-90
pubmed: 12736688
Mol Biol Evol. 2019 Jul 1;36(7):1533-1550
pubmed: 30982891
J Comput Chem. 2010 Apr 15;31(5):904-16
pubmed: 19637210
Proteins. 2020 Jan;88(1):206-226
pubmed: 31344278
Mol Cell. 2016 Jul 21;63(2):337-346
pubmed: 27425410
Bioinformatics. 2020 Jan 1;36(1):122-130
pubmed: 31199465
Protein Sci. 2012 Sep;21(9):1241-52
pubmed: 22811394
J Chem Theory Comput. 2017 Jun 13;13(6):3031-3048
pubmed: 28430426
Bioinformatics. 2010 Oct 1;26(19):2466-7
pubmed: 20685957
PLoS Comput Biol. 2018 Apr 27;14(4):e1006112
pubmed: 29702641
Methods Enzymol. 1996;266:525-39
pubmed: 8743704
Proc Natl Acad Sci U S A. 2000 Sep 12;97(19):10383-8
pubmed: 10984534
PLoS Comput Biol. 2017 Jun 12;13(6):e1005600
pubmed: 28604768
Science. 2011 May 13;332(6031):816-21
pubmed: 21566186
PLoS One. 2011;6(5):e19230
pubmed: 21603656
J Comput Chem. 2007 Oct;28(13):2122-9
pubmed: 17471460
PLoS Genet. 2010 Oct 21;6(10):e1001162
pubmed: 20975933
Nature. 2008 May 8;453(7192):190-5
pubmed: 18354394