DeCoDe: degenerate codon design for complete protein-coding DNA libraries.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 06 2020
01 06 2020
Historique:
received:
13
08
2019
revised:
13
02
2020
accepted:
13
03
2020
pubmed:
17
3
2020
medline:
30
10
2020
entrez:
17
3
2020
Statut:
ppublish
Résumé
High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein-screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein-variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity. We introduce a novel algorithm for total DC library optimization, degenerate codon design (DeCoDe), based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states. github.com/OrensteinLab/DeCoDe. yaronore@bgu.ac.il. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 32176271
pii: 5807608
doi: 10.1093/bioinformatics/btaa162
pmc: PMC7267834
doi:
Substances chimiques
Codon
0
Proteins
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
3357-3364Subventions
Organisme : NIGMS NIH HHS
ID : DP2 GM123641
Pays : United States
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
Proc Natl Acad Sci U S A. 1997 Nov 11;94(23):12297-302
pubmed: 9356443
Science. 2003 Sep 19;301(5640):1714-7
pubmed: 14500980
Proc Natl Acad Sci U S A. 2016 Nov 15;113(46):13045-13050
pubmed: 27799545
J Comput Biol. 2011 Nov;18(11):1743-56
pubmed: 21923411
FEBS Lett. 2001 Nov 23;508(3):309-12
pubmed: 11728441
Nat Methods. 2008 Dec;5(12):1039-45
pubmed: 19029907
Nat Biotechnol. 1997 Jun;15(6):553-7
pubmed: 9181578
Nature. 2016 May 11;533(7603):397-401
pubmed: 27193686
Science. 2003 Nov 21;302(5649):1364-8
pubmed: 14631033
Nat Methods. 2019 Apr;16(4):277-278
pubmed: 30886412
Nucleic Acids Res. 2005 Jun 10;33(10):3390-400
pubmed: 15951512
Gene. 1992 Feb 15;111(2):229-33
pubmed: 1347277
Nature. 2005 Sep 22;437(7058):512-8
pubmed: 16177782
Cell. 2009 Aug 21;138(4):774-86
pubmed: 19703402
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1:S14
pubmed: 21342543
Proc Natl Acad Sci U S A. 2003 Jul 8;100(14):8308-13
pubmed: 12824471
Protein Eng Des Sel. 2005 Dec;18(12):559-61
pubmed: 16239261
Protein Sci. 1993 Aug;2(8):1249-54
pubmed: 8401210
Sci Rep. 2018 Nov 13;8(1):16757
pubmed: 30425279
J Mol Biol. 1986 Apr 5;188(3):491-4
pubmed: 3525847
J Mach Learn Res. 2016 Apr;17:
pubmed: 27375369
Proc Natl Acad Sci U S A. 2019 Apr 30;116(18):8852-8858
pubmed: 30979809
Biotechnology (N Y). 1992 Mar;10(3):297-300
pubmed: 1368102
Nucleic Acids Res. 2004 Feb 20;32(3):e36
pubmed: 14978223
ACS Synth Biol. 2018 Sep 21;7(9):2014-2022
pubmed: 30103599
Methods Enzymol. 2011;487:545-74
pubmed: 21187238
Proteins. 2014 Aug;82(8):1668-73
pubmed: 24623659
Nucleic Acids Res. 2015 Mar 11;43(5):e34
pubmed: 25539925
Nature. 2014 Apr 17;508(7496):331-9
pubmed: 24740064
Science. 2018 Jan 19;359(6373):343-347
pubmed: 29301959
Nature. 2005 Nov 3;438(7064):117-21
pubmed: 16267559
Protein Eng. 2002 Oct;15(10):779-82
pubmed: 12468711
Proc Natl Acad Sci U S A. 2010 Mar 2;107(9):4004-9
pubmed: 20142500
Proc Natl Acad Sci U S A. 1986 Mar;83(6):1588-92
pubmed: 3513181
Protein Sci. 1999 Mar;8(3):680-8
pubmed: 10091671
Science. 1999 Oct 8;286(5438):295-9
pubmed: 10514373
Science. 1985 Jun 14;228(4705):1315-7
pubmed: 4001944
Proc Natl Acad Sci U S A. 2015 Jun 9;112(23):7159-64
pubmed: 26040002
Proc Natl Acad Sci U S A. 1991 Sep 15;88(18):7978-82
pubmed: 1896445
Biochemistry. 2005 Jul 19;44(28):9657-72
pubmed: 16008351
Nucleic Acids Res. 2010 May;38(8):2522-40
pubmed: 20308161
ACS Synth Biol. 2018 Sep 21;7(9):2317-2321
pubmed: 30114904
Science. 1966 Jul 22;153(3734):420-4
pubmed: 5328568