Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets.


Journal

Biotechnology for biofuels
ISSN: 1754-6834
Titre abrégé: Biotechnol Biofuels
Pays: England
ID NLM: 101316935

Informations de publication

Date de publication:
10 May 2021
Historique:
received: 29 12 2020
accepted: 26 04 2021
entrez: 11 5 2021
pubmed: 12 5 2021
medline: 12 5 2021
Statut: epublish

Résumé

Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research. We optimized and employed a pipeline integrating various "guilt-by-association" (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum. Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum. In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions. This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.

Sections du résumé

BACKGROUND BACKGROUND
Mass spectrometry-based proteomics can identify and quantify thousands of proteins from individual microbial species, but a significant percentage of these proteins are unannotated and hence classified as proteins of unknown function (PUFs). Due to the difficulty in extracting meaningful metabolic information, PUFs are often overlooked or discarded during data analysis, even though they might be critically important in functional activities, in particular for metabolic engineering research.
RESULTS RESULTS
We optimized and employed a pipeline integrating various "guilt-by-association" (GBA) metrics, including differential expression and co-expression analyses of high-throughput mass spectrometry proteome data and phylogenetic coevolution analysis, and sequence homology-based approaches to determine putative functions for PUFs in Clostridium thermocellum. Our various analyses provided putative functional information for over 95% of the PUFs detected by mass spectrometry in a wild-type and/or an engineered strain of C. thermocellum. In particular, we validated a predicted acyltransferase PUF (WP_003519433.1) with functional activity towards 2-phenylethyl alcohol, consistent with our GBA and sequence homology-based predictions.
CONCLUSIONS CONCLUSIONS
This work demonstrates the value of leveraging sequence homology-based annotations with empirical evidence based on the concept of GBA to broadly predict putative functions for PUFs, opening avenues to further interrogation via targeted experiments.

Identifiants

pubmed: 33971924
doi: 10.1186/s13068-021-01964-4
pii: 10.1186/s13068-021-01964-4
pmc: PMC8112048
doi:

Types de publication

Journal Article

Langues

eng

Pagination

116

Subventions

Organisme : Office of Science
ID : Center for Bioenergy Innovation (CBI)

Références

Biotechnol Biofuels. 2015 Feb 12;8:20
pubmed: 25763101
Biotechnol Bioeng. 2016 Aug;113(8):1764-76
pubmed: 26853081
Genome Biol. 2019 Nov 19;20(1):244
pubmed: 31744546
PLoS Curr. 2010 Nov 19;2:RRN1198
pubmed: 21113338
PLoS Comput Biol. 2012;8(3):e1002444
pubmed: 22479173
Proc Natl Acad Sci U S A. 2004 Jun 15;101(24):9033-8
pubmed: 15175431
Biotechnol Biofuels. 2016 Jun 02;9:116
pubmed: 27257435
Proc Natl Acad Sci U S A. 1999 Apr 13;96(8):4285-8
pubmed: 10200254
Curr Opin Cell Biol. 2015 Apr;33:125-31
pubmed: 25703630
Bioinformatics. 2006 Jul 1;22(13):1658-9
pubmed: 16731699
Nucleic Acids Res. 2017 Nov 16;45(20):11495-11514
pubmed: 29059321
Nucleic Acids Res. 2014 Jan;42(Database issue):D654-9
pubmed: 24214966
Nat Chem Biol. 2014 Apr;10(4):259-65
pubmed: 24609358
Metab Eng. 2014 Nov;26:77-88
pubmed: 25281839
Methods Mol Biol. 2014;1137:1-15
pubmed: 24573470
J Proteome Res. 2011 Sep 2;10(9):3871-9
pubmed: 21761931
BMC Bioinformatics. 2011 Aug 02;12:315
pubmed: 21806838
Appl Environ Microbiol. 2011 Dec;77(23):8288-94
pubmed: 21965408
Protein Expr Purif. 2004 Sep;37(1):203-6
pubmed: 15294299
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
pubmed: 12136088
PLoS One. 2011 Feb 18;6(2):e17258
pubmed: 21364756
Biochem J. 2009 Dec 14;425(1):1-11
pubmed: 20001958
Genome Res. 2002 Jan;12(1):37-46
pubmed: 11779829
Biotechnol Biofuels. 2019 Oct 15;12:245
pubmed: 31636704
Genome Res. 1999 Dec;9(12):1198-203
pubmed: 10613842
Int J Syst Evol Microbiol. 2019 Dec;69(12):3927-3932
pubmed: 31526446
Nucleic Acids Res. 1987 Feb 11;15(3):1281-95
pubmed: 3547335
OMICS. 2012 May;16(5):284-7
pubmed: 22455463
J Ind Microbiol Biotechnol. 2018 Nov;45(11):1007-1015
pubmed: 30187243
Nat Methods. 2011 Sep 29;8(10):785-6
pubmed: 21959131
Metab Eng Commun. 2016 Aug 06;3:245-251
pubmed: 29142826
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
J Struct Funct Genomics. 2015 Mar;16(1):43-54
pubmed: 25630330
Mol Biol Evol. 2006 Feb;23(2):327-37
pubmed: 16237209
Proc Natl Acad Sci U S A. 2005 Oct 4;102(40):14338-43
pubmed: 16176987
Nat Methods. 2007 Nov;4(11):923-5
pubmed: 17952086
Plant Physiol. 2015 Nov;169(3):1436-42
pubmed: 26269542
Int J Mol Sci. 2012 Oct 08;13(10):12761-72
pubmed: 23202924
PLoS One. 2014 Feb 07;9(2):e86389
pubmed: 24516531
Front Chem. 2014 Aug 26;2:66
pubmed: 25207268
Metab Eng. 2013 Jan;15:151-8
pubmed: 23202749
Biotechnol Biofuels. 2018 Apr 05;11:98
pubmed: 29632556
Mol Biol Evol. 2006 Sep;23(9):1751-61
pubmed: 16782762
Metab Eng. 2015 Nov;32:49-54
pubmed: 26369438
Nat Commun. 2018 Nov 23;9(1):4963
pubmed: 30470754
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30
pubmed: 24288371
BMC Bioinformatics. 2005 Sep 14;6:227
pubmed: 16162296
Nat Methods. 2016 Nov 29;13(12):964-966
pubmed: 27898063
Biotechnol Biofuels. 2018 Mar 23;11:80
pubmed: 29588665
Genome Biol. 2016 Sep 07;17(1):184
pubmed: 27604469
Genome Biol. 2019 Nov 14;20(1):238
pubmed: 31727128
Comput Struct Biotechnol J. 2015 Feb 18;13:182-91
pubmed: 25848497
Biotechnol Biofuels. 2019 Sep 20;12:226
pubmed: 31548868
J Ind Microbiol Biotechnol. 2015 Sep;42(9):1263-72
pubmed: 26162629
PLoS Comput Biol. 2005 Jun;1(1):e3
pubmed: 16103904
J Mol Biol. 2016 Feb 22;428(4):726-731
pubmed: 26585406
J Mol Biol. 2001 Jan 19;305(3):567-80
pubmed: 11152613
Bioinformatics. 2009 Aug 1;25(15):1972-3
pubmed: 19505945
Genome Res. 2012 Apr;22(4):714-20
pubmed: 22287101
Bioinformatics. 2001 Sep;17(9):847-8
pubmed: 11590104
mBio. 2013 Dec 31;5(1):e00744-13
pubmed: 24381303
Environ Microbiol. 2013 Oct;15(10):2631-41
pubmed: 23834245
Nature. 2000 Feb 10;403(6770):601-3
pubmed: 10688178
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Nucleic Acids Res. 2017 Jan 4;45(D1):D200-D203
pubmed: 27899674
Genome Biol. 2018 Oct 25;19(1):172
pubmed: 30359297
Biotechnol Biofuels. 2017 Jan 10;10:14
pubmed: 28077967
Proteomics. 2010 Dec;10(23):4209-12
pubmed: 21089048
BMC Genomics. 2007 Jul 09;8:222
pubmed: 17620139
BMC Genomics. 2020 May 20;21(1):370
pubmed: 32434474
Chem Rev. 2007 Aug;107(8):3448-66
pubmed: 17658902
Nucleic Acids Res. 2018 Jul 2;46(W1):W84-W88
pubmed: 29741643
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Nat Methods. 2013 Mar;10(3):221-7
pubmed: 23353650
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Mol Biol Evol. 2017 Aug 1;34(8):2115-2122
pubmed: 28460117
Comput Appl Biosci. 1992 Jun;8(3):275-82
pubmed: 1633570
Int J Syst Evol Microbiol. 2018 Oct;68(10):3197-3211
pubmed: 30124399

Auteurs

Suresh Poudel (S)

Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA.
The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA.

Alexander L Cope (AL)

Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA.

Kaela B O'Dell (KB)

Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA.
The Bredesen Center, University of Tennessee, Knoxville, TN, USA.

Adam M Guss (AM)

Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
The Bredesen Center, University of Tennessee, Knoxville, TN, USA.

Hyeongmin Seo (H)

The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN, USA.

Cong T Trinh (CT)

The Center for Bioenergy Innovation at Oak Ridge National Laboratory, Oak Ridge, TN, USA.
The Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA.
The Bredesen Center, University of Tennessee, Knoxville, TN, USA.
Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, TN, USA.

Robert L Hettich (RL)

Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA. hettichrl@ornl.gov.

Classifications MeSH