Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases.
A. thaliana
C. elegans
D. melanogaster
GT evolution
GT phylogeny
S. cerevisiae
common core
computational biology
donor prediction
evolutionary biology
glycosyltransferase
human
machine learning
systems biology
Journal
eLife
ISSN: 2050-084X
Titre abrégé: Elife
Pays: England
ID NLM: 101579614
Informations de publication
Date de publication:
01 04 2020
01 04 2020
Historique:
received:
17
12
2019
accepted:
31
03
2020
pubmed:
3
4
2020
medline:
30
3
2021
entrez:
3
4
2020
Statut:
epublish
Résumé
Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation. Carbohydrates are one of the major groups of large biological molecules that regulate nearly all aspects of life. Yet, unlike DNA or proteins, carbohydrates are made without a template to follow. Instead, these molecules are built from a set of sugar-based building blocks by the intricate activities of a large and diverse family of enzymes known as glycosyltransferases. An incomplete understanding of how glycosyltransferases recognize and build diverse carbohydrates presents a major bottleneck in developing therapeutic strategies for diseases associated with abnormalities in these enzymes. It also limits efforts to engineer these enzymes for biotechnology applications and biofuel production. Taujale et al. have now used evolutionary approaches to map the evolution of a major subset of glycosyltransferases from species across the tree of life to understand how these enzymes evolved such precise mechanisms to build diverse carbohydrates. First, a minimal structural unit was defined based on being shared among a group of over half a million unique glycosyltransferase enzymes with different activities. Further analysis then showed that the diverse activities of these enzymes evolved through the accumulation of mutations within this structural unit, as well as in much more variable regions in the enzyme that extend from the minimal unit. Taujale et al. then built an extended family tree for this collection of glycosyltransferases and details of the evolutionary relationships between the enzymes helped them to create a machine learning framework that could predict which sugar-containing molecules were the raw materials for a given glycosyltransferase. This framework could make predictions with nearly 90% accuracy based only on information that can be deciphered from the gene for that enzyme. These findings will provide scientists with new hypotheses for investigating the complex relationships connecting the genetic information about glycosyltransferases with their structures and activities. Further refinement of the machine learning framework may eventually enable the design of enzymes with properties that are desirable for applications in biotechnology.
Autres résumés
Type: plain-language-summary
(eng)
Carbohydrates are one of the major groups of large biological molecules that regulate nearly all aspects of life. Yet, unlike DNA or proteins, carbohydrates are made without a template to follow. Instead, these molecules are built from a set of sugar-based building blocks by the intricate activities of a large and diverse family of enzymes known as glycosyltransferases. An incomplete understanding of how glycosyltransferases recognize and build diverse carbohydrates presents a major bottleneck in developing therapeutic strategies for diseases associated with abnormalities in these enzymes. It also limits efforts to engineer these enzymes for biotechnology applications and biofuel production. Taujale et al. have now used evolutionary approaches to map the evolution of a major subset of glycosyltransferases from species across the tree of life to understand how these enzymes evolved such precise mechanisms to build diverse carbohydrates. First, a minimal structural unit was defined based on being shared among a group of over half a million unique glycosyltransferase enzymes with different activities. Further analysis then showed that the diverse activities of these enzymes evolved through the accumulation of mutations within this structural unit, as well as in much more variable regions in the enzyme that extend from the minimal unit. Taujale et al. then built an extended family tree for this collection of glycosyltransferases and details of the evolutionary relationships between the enzymes helped them to create a machine learning framework that could predict which sugar-containing molecules were the raw materials for a given glycosyltransferase. This framework could make predictions with nearly 90% accuracy based only on information that can be deciphered from the gene for that enzyme. These findings will provide scientists with new hypotheses for investigating the complex relationships connecting the genetic information about glycosyltransferases with their structures and activities. Further refinement of the machine learning framework may eventually enable the design of enzymes with properties that are desirable for applications in biotechnology.
Identifiants
pubmed: 32234211
doi: 10.7554/eLife.54532
pii: 54532
pmc: PMC7185993
doi:
pii:
Substances chimiques
Glycosyltransferases
EC 2.4.-
Banques de données
Dryad
['10.5061/dryad.v15dv41sh']
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NIGMS NIH HHS
ID : R01 GM130915
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM107004
Pays : United States
Informations de copyright
© 2020, Taujale et al.
Déclaration de conflit d'intérêts
RT, AV, LH, ZZ, WY, KR, SL, AE, KM, NK No competing interests declared
Références
Glycobiology. 2009 Aug;19(8):918-33
pubmed: 19468051
PLoS Biol. 2007 Mar;5(3):e17
pubmed: 17355172
Trends Plant Sci. 2003 Aug;8(8):374-9
pubmed: 12927970
J Biol Chem. 2003 Apr 4;278(14):12403-5
pubmed: 12529355
Curr Opin Struct Biol. 2010 Oct;20(5):536-42
pubmed: 20705453
Proc Natl Acad Sci U S A. 2018 May 1;115(18):4637-4642
pubmed: 29666272
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
J Mol Biol. 2006 Jun 30;360(1):67-79
pubmed: 16769084
Structure. 2018 Jun 5;26(6):801-809.e3
pubmed: 29681470
J Biol Chem. 2013 Nov 1;288(44):31963-70
pubmed: 24052259
Glycobiology. 2006 Jul;16(7):679-91
pubmed: 16603625
Glycobiology. 1999 Oct;9(10):1061-71
pubmed: 10521543
J Biol Chem. 2002 Jun 7;277(23):20833-9
pubmed: 11916963
Front Cell Infect Microbiol. 2012 Feb 14;2:9
pubmed: 22919601
Nat Commun. 2018 Aug 23;9(1):3380
pubmed: 30140003
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5
pubmed: 17130148
J Mol Biol. 2010 Sep 17;402(2):399-411
pubmed: 20655926
Mol Microbiol. 2011 Jan;79(1):76-93
pubmed: 21166895
Biol Direct. 2016 Aug 04;11:36
pubmed: 27492357
Elife. 2020 Apr 01;9:
pubmed: 32234211
J Biol Chem. 2014 Mar 21;289(12):8041-50
pubmed: 24459149
Biochem Soc Trans. 2016 Feb;44(1):129-42
pubmed: 26862198
Sci Signal. 2019 Apr 23;12(578):
pubmed: 31015289
Bioinformatics. 2017 Apr 1;33(7):1093-1095
pubmed: 28062446
Nat Chem Biol. 2018 Dec;14(12):1109-1117
pubmed: 30420693
Methods Enzymol. 2010;479:185-204
pubmed: 20816167
Mol Cell Biol. 2009 Feb;29(4):943-52
pubmed: 19075007
Curr Opin Struct Biol. 2012 Oct;22(5):540-9
pubmed: 22819665
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
Bioinformatics. 2018 May 15;34(10):1719-1725
pubmed: 29281009
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W604-8
pubmed: 16845081
Structure. 2017 Jul 5;25(7):1034-1044.e3
pubmed: 28625787
Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259
pubmed: 30931475
PLoS One. 2015 May 29;10(5):e0128409
pubmed: 26023931
Genome Res. 2003 Nov;13(11):2498-504
pubmed: 14597658
Nat Chem Biol. 2019 Sep;15(9):853-864
pubmed: 31427814
Glycobiology. 2018 Aug 1;28(8):624-636
pubmed: 29873711
Nature. 2013 Jan 10;493(7431):181-6
pubmed: 23222542
Cell Rep. 2019 Feb 26;26(9):2298-2306.e5
pubmed: 30811981
Genome Res. 2004 Jun;14(6):1188-90
pubmed: 15173120
Annu Rev Biochem. 2008;77:521-55
pubmed: 18518825
Biochemistry. 2005 Mar 8;44(9):3202-10
pubmed: 15736931
Glycobiology. 2006 Feb;16(2):29R-37R
pubmed: 16037492
Bioinformatics. 2009 Aug 1;25(15):1869-75
pubmed: 19505947
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Curr Opin Struct Biol. 2014 Oct;28:131-41
pubmed: 25240227
ACS Chem Neurosci. 2019 May 15;10(5):2209-2221
pubmed: 30985105
Curr Opin Chem Biol. 2006 Oct;10(5):509-19
pubmed: 16905354
J Biol Chem. 2012 May 4;287(19):15317-29
pubmed: 22416136
FASEB J. 1995 Sep;9(12):1126-37
pubmed: 7672505
Nucleic Acids Res. 2017 Jan 4;45(D1):D200-D203
pubmed: 27899674
J Mol Biol. 2007 Jun 22;369(5):1270-81
pubmed: 17493636
Nat Struct Biol. 2002 Sep;9(9):685-90
pubmed: 12198488
Nat Commun. 2017 Jul 25;8(1):120
pubmed: 28743912
Protein Sci. 2003 Jul;12(7):1418-31
pubmed: 12824488
BMC Bioinformatics. 2009 Jul 17;10:222
pubmed: 19615046
Nat Struct Biol. 2001 Feb;8(2):166-75
pubmed: 11175908
Nucleic Acids Res. 1999 Jan 1;27(1):368-9
pubmed: 9847231
Proc Natl Acad Sci U S A. 2004 Oct 26;101(43):15307-12
pubmed: 15486088
J Biol Chem. 2010 May 14;285(20):15619-26
pubmed: 20236943
Nat Methods. 2017 Jun;14(6):587-589
pubmed: 28481363
Nucleic Acids Res. 2014 Jan;42(Database issue):D310-4
pubmed: 24293656
J Mol Biol. 2010 May 28;399(1):196-206
pubmed: 20381499
Bioinformatics. 2007 Aug 1;23(15):1875-82
pubmed: 17519246
J Comput Biol. 2014 Mar;21(3):269-86
pubmed: 24494927
Nucleic Acids Res. 2014 Jan;42(Database issue):D490-5
pubmed: 24270786