Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network.

Algorithms Amino Acid Sequence Computational Biology / methods Databases, Protein Epistasis, Genetic Mutation / genetics Neural Networks, Computer Proteins / chemistry Thermodynamics

Journal

PloS one

ISSN: 1932-6203

Titre abrégé: PLoS One

Pays: United States

ID NLM: 101285081

Informations de publication

Date de publication:
2021

Historique:

received: 23 12 2020

accepted: 12 08 2021

entrez: 26 8 2021

pubmed: 27 8 2021

medline: 15 12 2021

Statut: epublish

Résumé

Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.

Identifiants

DOI: 10.1371/journal.pone.0256691 PMID: 34437621 PMC: PMC8389498

pubmed: 34437621

doi: 10.1371/journal.pone.0256691

pii: PONE-D-20-40388

pmc: PMC8389498

doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

e0256691

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Cell. 2014 Jun 19;157(7):1644-1656

pubmed: 24949974

Nat Biotechnol. 2007 Oct;25(10):1171-6

pubmed: 17891135

PLoS Comput Biol. 2012;8(8):e1002639

pubmed: 22927804

Biochemistry. 2010 Apr 13;49(14):2987-98

pubmed: 20235548

Bioinformatics. 2018 Sep 1;34(17):i811-i820

pubmed: 30423073

Proc Natl Acad Sci U S A. 2019 Jan 29;116(5):1597-1602

pubmed: 30642961

Sci Rep. 2020 Mar 9;10(1):4371

pubmed: 32152349

Nat Chem Biol. 2019 Nov;15(11):1120-1128

pubmed: 31636435

Nature. 2016 Sep 14;537(7620):320-7

pubmed: 27629638

J Mol Evol. 1980 Jul;15(3):197-218

pubmed: 7401178

PLoS One. 2011;6(7):e20937

pubmed: 21754981

Bioinformatics. 2019 Sep 15;35(18):3320-3328

pubmed: 30759180

Curr Opin Struct Biol. 2016 Aug;39:16-26

pubmed: 27086078

J Mol Biol. 2005 Mar 18;347(1):203-27

pubmed: 15733929

Protein Sci. 2016 Jul;25(7):1204-18

pubmed: 26833806

Nat Struct Biol. 2003 Jan;10(1):45-52

pubmed: 12459719

J Mol Biol. 2007 Sep 7;372(1):1-6

pubmed: 17628593

Methods Enzymol. 2011;487:545-74

pubmed: 21187238

J Chem Theory Comput. 2015 Feb 10;11(2):609-22

pubmed: 25866491

Proteins. 2018 Apr;86(4):383-392

pubmed: 29318667

Curr Opin Genet Dev. 2013 Dec;23(6):700-7

pubmed: 24290990

Proc Natl Acad Sci U S A. 2004 Aug 10;101(32):11566-70

pubmed: 15292507

Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503

pubmed: 31896580

Nat Rev Mol Cell Biol. 2019 Nov;20(11):681-697

pubmed: 31417196

PLoS Comput Biol. 2007 Aug;3(8):e164

pubmed: 17722975

J Mol Biol. 2002 Jul 5;320(2):369-87

pubmed: 12079393

Proteins. 2004 Dec 1;57(4):678-83

pubmed: 15390263

Methods Enzymol. 2013;523:171-90

pubmed: 23422430

Structure. 2014 Feb 4;22(2):218-29

pubmed: 24361272

Bioinformatics. 2020 Feb 15;36(4):1135-1142

pubmed: 31588495

Protein Sci. 2016 Jul;25(7):1260-72

pubmed: 26757214

Comput Struct Biotechnol J. 2019 Dec 26;18:162-176

pubmed: 31969975

Proteins. 2014 May;82(5):858-66

pubmed: 24265170

Biophys Chem. 2011 Nov;159(1):129-41

pubmed: 21684672

Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13274-9

pubmed: 14597710

Science. 1998 Nov 20;282(5393):1462-7

pubmed: 9822371

Nature. 2003 May 8;423(6936):185-90

pubmed: 12736688

Mol Biol Evol. 2019 Jul 1;36(7):1533-1550

pubmed: 30982891

J Comput Chem. 2010 Apr 15;31(5):904-16

pubmed: 19637210

Proteins. 2020 Jan;88(1):206-226

pubmed: 31344278

Mol Cell. 2016 Jul 21;63(2):337-346

pubmed: 27425410

Bioinformatics. 2020 Jan 1;36(1):122-130

pubmed: 31199465

Protein Sci. 2012 Sep;21(9):1241-52

pubmed: 22811394

J Chem Theory Comput. 2017 Jun 13;13(6):3031-3048

pubmed: 28430426

Bioinformatics. 2010 Oct 1;26(19):2466-7

pubmed: 20685957

PLoS Comput Biol. 2018 Apr 27;14(4):e1006112

pubmed: 29702641

Methods Enzymol. 1996;266:525-39

pubmed: 8743704

Proc Natl Acad Sci U S A. 2000 Sep 12;97(19):10383-8

pubmed: 10984534

PLoS Comput Biol. 2017 Jun 12;13(6):e1005600

pubmed: 28604768

Science. 2011 May 13;332(6031):816-21

pubmed: 21566186

PLoS One. 2011;6(5):e19230

pubmed: 21603656

J Comput Chem. 2007 Oct;28(13):2122-9

pubmed: 17471460

PLoS Genet. 2010 Oct 21;6(10):e1001162

pubmed: 20975933

Nature. 2008 May 8;453(7192):190-5

pubmed: 18354394

Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Déclaration de conflit d'intérêts

Références

Auteurs

Julian Nazet (J)

Elmar Lang (E)

Rainer Merkl (R)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Spatial multiplex analysis of lung cancer reveals that regulatory T cells attenuate KRAS-G12C inhibitor-induced immune responses.

Pathogenic mitochondrial DNA mutations inhibit melanoma metastasis.

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

Classifications MeSH