Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases.

Evolution, Molecular Glycosyltransferases / chemistry Humans Phylogeny Protein Folding Substrate Specificity

A. thaliana C. elegans D. melanogaster GT evolution GT phylogeny S. cerevisiae common core computational biology donor prediction evolutionary biology glycosyltransferase human machine learning systems biology

Journal

eLife

ISSN: 2050-084X

Titre abrégé: Elife

Pays: England

ID NLM: 101579614

Informations de publication

Date de publication:
01 04 2020

Historique:

received: 17 12 2019

accepted: 31 03 2020

pubmed: 3 4 2020

medline: 30 3 2021

entrez: 3 4 2020

Statut: epublish

Résumé

Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation. Carbohydrates are one of the major groups of large biological molecules that regulate nearly all aspects of life. Yet, unlike DNA or proteins, carbohydrates are made without a template to follow. Instead, these molecules are built from a set of sugar-based building blocks by the intricate activities of a large and diverse family of enzymes known as glycosyltransferases. An incomplete understanding of how glycosyltransferases recognize and build diverse carbohydrates presents a major bottleneck in developing therapeutic strategies for diseases associated with abnormalities in these enzymes. It also limits efforts to engineer these enzymes for biotechnology applications and biofuel production. Taujale et al. have now used evolutionary approaches to map the evolution of a major subset of glycosyltransferases from species across the tree of life to understand how these enzymes evolved such precise mechanisms to build diverse carbohydrates. First, a minimal structural unit was defined based on being shared among a group of over half a million unique glycosyltransferase enzymes with different activities. Further analysis then showed that the diverse activities of these enzymes evolved through the accumulation of mutations within this structural unit, as well as in much more variable regions in the enzyme that extend from the minimal unit. Taujale et al. then built an extended family tree for this collection of glycosyltransferases and details of the evolutionary relationships between the enzymes helped them to create a machine learning framework that could predict which sugar-containing molecules were the raw materials for a given glycosyltransferase. This framework could make predictions with nearly 90% accuracy based only on information that can be deciphered from the gene for that enzyme. These findings will provide scientists with new hypotheses for investigating the complex relationships connecting the genetic information about glycosyltransferases with their structures and activities. Further refinement of the machine learning framework may eventually enable the design of enzymes with properties that are desirable for applications in biotechnology.

Autres résumés

Type: plain-language-summary (eng)

Carbohydrates are one of the major groups of large biological molecules that regulate nearly all aspects of life. Yet, unlike DNA or proteins, carbohydrates are made without a template to follow. Instead, these molecules are built from a set of sugar-based building blocks by the intricate activities of a large and diverse family of enzymes known as glycosyltransferases. An incomplete understanding of how glycosyltransferases recognize and build diverse carbohydrates presents a major bottleneck in developing therapeutic strategies for diseases associated with abnormalities in these enzymes. It also limits efforts to engineer these enzymes for biotechnology applications and biofuel production. Taujale et al. have now used evolutionary approaches to map the evolution of a major subset of glycosyltransferases from species across the tree of life to understand how these enzymes evolved such precise mechanisms to build diverse carbohydrates. First, a minimal structural unit was defined based on being shared among a group of over half a million unique glycosyltransferase enzymes with different activities. Further analysis then showed that the diverse activities of these enzymes evolved through the accumulation of mutations within this structural unit, as well as in much more variable regions in the enzyme that extend from the minimal unit. Taujale et al. then built an extended family tree for this collection of glycosyltransferases and details of the evolutionary relationships between the enzymes helped them to create a machine learning framework that could predict which sugar-containing molecules were the raw materials for a given glycosyltransferase. This framework could make predictions with nearly 90% accuracy based only on information that can be deciphered from the gene for that enzyme. These findings will provide scientists with new hypotheses for investigating the complex relationships connecting the genetic information about glycosyltransferases with their structures and activities. Further refinement of the machine learning framework may eventually enable the design of enzymes with properties that are desirable for applications in biotechnology.

Identifiants

DOI: 10.7554/eLife.54532 PMID: 32234211 PMC: PMC7185993

pubmed: 32234211

doi: 10.7554/eLife.54532

pii: 54532

pmc: PMC7185993

doi:

pii:

Substances chimiques

Glycosyltransferases EC 2.4.-

Banques de données

Dryad

['10.5061/dryad.v15dv41sh']

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : NIGMS NIH HHS

ID : R01 GM130915

Pays : United States

Organisme : NIGMS NIH HHS

ID : T32 GM107004

Pays : United States

Informations de copyright

Déclaration de conflit d'intérêts

RT, AV, LH, ZZ, WY, KR, SL, AE, KM, NK No competing interests declared

Références

Glycobiology. 2009 Aug;19(8):918-33

pubmed: 19468051

PLoS Biol. 2007 Mar;5(3):e17

pubmed: 17355172

Trends Plant Sci. 2003 Aug;8(8):374-9

pubmed: 12927970

J Biol Chem. 2003 Apr 4;278(14):12403-5

pubmed: 12529355

Curr Opin Struct Biol. 2010 Oct;20(5):536-42

pubmed: 20705453

Proc Natl Acad Sci U S A. 2018 May 1;115(18):4637-4642

pubmed: 29666272

J Mol Biol. 1990 Oct 5;215(3):403-10

pubmed: 2231712

J Mol Biol. 2006 Jun 30;360(1):67-79

pubmed: 16769084

Structure. 2018 Jun 5;26(6):801-809.e3

pubmed: 29681470

J Biol Chem. 2013 Nov 1;288(44):31963-70

pubmed: 24052259

Glycobiology. 2006 Jul;16(7):679-91

pubmed: 16603625

Glycobiology. 1999 Oct;9(10):1061-71

pubmed: 10521543

J Biol Chem. 2002 Jun 7;277(23):20833-9

pubmed: 11916963

Front Cell Infect Microbiol. 2012 Feb 14;2:9

pubmed: 22919601

Nat Commun. 2018 Aug 23;9(1):3380

pubmed: 30140003

Mol Biol Evol. 2013 Apr;30(4):772-80

pubmed: 23329690

Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5

pubmed: 17130148

J Mol Biol. 2010 Sep 17;402(2):399-411

pubmed: 20655926

Mol Microbiol. 2011 Jan;79(1):76-93

pubmed: 21166895

Biol Direct. 2016 Aug 04;11:36

pubmed: 27492357

Elife. 2020 Apr 01;9:

pubmed: 32234211

J Biol Chem. 2014 Mar 21;289(12):8041-50

pubmed: 24459149

Biochem Soc Trans. 2016 Feb;44(1):129-42

pubmed: 26862198

Sci Signal. 2019 Apr 23;12(578):

pubmed: 31015289

Bioinformatics. 2017 Apr 1;33(7):1093-1095

pubmed: 28062446

Nat Chem Biol. 2018 Dec;14(12):1109-1117

pubmed: 30420693

Methods Enzymol. 2010;479:185-204

pubmed: 20816167

Mol Cell Biol. 2009 Feb;29(4):943-52

pubmed: 19075007

Curr Opin Struct Biol. 2012 Oct;22(5):540-9

pubmed: 22819665

Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515

pubmed: 30395287

Bioinformatics. 2018 May 15;34(10):1719-1725

pubmed: 29281009

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W604-8

pubmed: 16845081

Structure. 2017 Jul 5;25(7):1034-1044.e3

pubmed: 28625787

Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259

pubmed: 30931475

PLoS One. 2015 May 29;10(5):e0128409

pubmed: 26023931

Genome Res. 2003 Nov;13(11):2498-504

pubmed: 14597658

Nat Chem Biol. 2019 Sep;15(9):853-864

pubmed: 31427814

Glycobiology. 2018 Aug 1;28(8):624-636

pubmed: 29873711

Nature. 2013 Jan 10;493(7431):181-6

pubmed: 23222542

Cell Rep. 2019 Feb 26;26(9):2298-2306.e5

pubmed: 30811981

Genome Res. 2004 Jun;14(6):1188-90

pubmed: 15173120

Annu Rev Biochem. 2008;77:521-55

pubmed: 18518825

Biochemistry. 2005 Mar 8;44(9):3202-10

pubmed: 15736931

Glycobiology. 2006 Feb;16(2):29R-37R

pubmed: 16037492

Bioinformatics. 2009 Aug 1;25(15):1869-75

pubmed: 19505947

Mol Biol Evol. 2015 Jan;32(1):268-74

pubmed: 25371430

Curr Opin Struct Biol. 2014 Oct;28:131-41

pubmed: 25240227

ACS Chem Neurosci. 2019 May 15;10(5):2209-2221

pubmed: 30985105

Curr Opin Chem Biol. 2006 Oct;10(5):509-19

pubmed: 16905354

J Biol Chem. 2012 May 4;287(19):15317-29

pubmed: 22416136

FASEB J. 1995 Sep;9(12):1126-37

pubmed: 7672505

Nucleic Acids Res. 2017 Jan 4;45(D1):D200-D203

pubmed: 27899674

J Mol Biol. 2007 Jun 22;369(5):1270-81

pubmed: 17493636

Nat Struct Biol. 2002 Sep;9(9):685-90

pubmed: 12198488

Nat Commun. 2017 Jul 25;8(1):120

pubmed: 28743912

Protein Sci. 2003 Jul;12(7):1418-31

pubmed: 12824488

BMC Bioinformatics. 2009 Jul 17;10:222

pubmed: 19615046

Nat Struct Biol. 2001 Feb;8(2):166-75

pubmed: 11175908

Nucleic Acids Res. 1999 Jan 1;27(1):368-9

pubmed: 9847231

Proc Natl Acad Sci U S A. 2004 Oct 26;101(43):15307-12

pubmed: 15486088

J Biol Chem. 2010 May 14;285(20):15619-26

pubmed: 20236943

Nat Methods. 2017 Jun;14(6):587-589

pubmed: 28481363

Nucleic Acids Res. 2014 Jan;42(Database issue):D310-4

pubmed: 24293656

J Mol Biol. 2010 May 28;399(1):196-206

pubmed: 20381499

Bioinformatics. 2007 Aug 1;23(15):1875-82

pubmed: 17519246

J Comput Biol. 2014 Mar;21(3):269-86

pubmed: 24494927

Nucleic Acids Res. 2014 Jan;42(Database issue):D490-5

pubmed: 24270786

Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases.

Journal

Informations de publication

Résumé

Autres résumés

Identifiants

Substances chimiques

Banques de données

Types de publication

Langues

Sous-ensembles de citation

Subventions

Informations de copyright

Déclaration de conflit d'intérêts

Références

Auteurs

Rahil Taujale (R)

Aarya Venkat (A)

Liang-Chin Huang (LC)

Zhongliang Zhou (Z)

Wayland Yeung (W)

Khaled M Rasheed (KM)

Sheng Li (S)

Arthur S Edison (AS)

Kelley W Moremen (KW)

Natarajan Kannan (N)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Classifications MeSH