The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics.
Gloeobacterales
Cyanobacteria
Singularity containers
culture collections
genomics
metagenomics
nextflow
phylogenomics
phylogeny
workflow
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
28 12 2022
28 12 2022
Historique:
received:
24
10
2022
revised:
29
01
2023
accepted:
14
03
2023
medline:
11
4
2023
entrez:
10
4
2023
pubmed:
11
4
2023
Statut:
ppublish
Résumé
Microbial culture collections play a key role in taxonomy by studying the diversity of their strains and providing well-characterized biological material to the scientific community for fundamental and applied research. These microbial resource centers thus need to implement new standards in species delineation, including whole-genome sequencing and phylogenomics. In this context, the genomic needs of the Belgian Coordinated Collections of Microorganisms were studied, resulting in the GEN-ERA toolbox. The latter is a unified cluster of bioinformatic workflows dedicated to both bacteria and small eukaryotes (e.g., yeasts). This public toolbox allows researchers without a specific training in bioinformatics to perform robust phylogenomic analyses. Hence, it facilitates all steps from genome downloading and quality assessment, including genomic contamination estimation, to tree reconstruction. It also offers workflows for average nucleotide identity comparisons and metabolic modeling. Nextflow workflows are launched by a single command and are available on the GEN-ERA GitHub repository (https://github.com/Lcornet/GENERA). All the workflows are based on Singularity containers to increase reproducibility. The toolbox was developed for a diversity of microorganisms, including bacteria and fungi. It was further tested on an empirical dataset of 18 (meta)genomes of early branching Cyanobacteria, providing the most up-to-date phylogenomic analysis of the Gloeobacterales order, the first group to diverge in the evolutionary tree of Cyanobacteria. The GEN-ERA toolbox can be used to infer completely reproducible comparative genomic and metabolic analyses on prokaryotes and small eukaryotes. Although designed for routine bioinformatics of culture collections, it can also be used by all researchers interested in microbial taxonomy, as exemplified by our case study on Gloeobacterales.
Sections du résumé
BACKGROUND
Microbial culture collections play a key role in taxonomy by studying the diversity of their strains and providing well-characterized biological material to the scientific community for fundamental and applied research. These microbial resource centers thus need to implement new standards in species delineation, including whole-genome sequencing and phylogenomics. In this context, the genomic needs of the Belgian Coordinated Collections of Microorganisms were studied, resulting in the GEN-ERA toolbox. The latter is a unified cluster of bioinformatic workflows dedicated to both bacteria and small eukaryotes (e.g., yeasts).
FINDINGS
This public toolbox allows researchers without a specific training in bioinformatics to perform robust phylogenomic analyses. Hence, it facilitates all steps from genome downloading and quality assessment, including genomic contamination estimation, to tree reconstruction. It also offers workflows for average nucleotide identity comparisons and metabolic modeling.
TECHNICAL DETAILS
Nextflow workflows are launched by a single command and are available on the GEN-ERA GitHub repository (https://github.com/Lcornet/GENERA). All the workflows are based on Singularity containers to increase reproducibility.
TESTING
The toolbox was developed for a diversity of microorganisms, including bacteria and fungi. It was further tested on an empirical dataset of 18 (meta)genomes of early branching Cyanobacteria, providing the most up-to-date phylogenomic analysis of the Gloeobacterales order, the first group to diverge in the evolutionary tree of Cyanobacteria.
CONCLUSION
The GEN-ERA toolbox can be used to infer completely reproducible comparative genomic and metabolic analyses on prokaryotes and small eukaryotes. Although designed for routine bioinformatics of culture collections, it can also be used by all researchers interested in microbial taxonomy, as exemplified by our case study on Gloeobacterales.
Identifiants
pubmed: 37036103
pii: 7111624
doi: 10.1093/gigascience/giad022
pmc: PMC10084500
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2023. Published by Oxford University Press GigaScience.
Références
Curr Biol. 2017 Apr 3;27(7):958-967
pubmed: 28318975
Bioinformatics. 2022 Nov 30;38(23):5315-5316
pubmed: 36218463
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
NAR Genom Bioinform. 2021 Jan 06;3(1):lqaa108
pubmed: 33575650
Genome Biol. 2019 Nov 14;20(1):238
pubmed: 31727128
Nat Commun. 2021 Aug 17;12(1):4973
pubmed: 34404788
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Nat Ecol Evol. 2017 Sep;1(9):1370-1378
pubmed: 28890940
mSystems. 2022 Jun 28;7(3):e0150021
pubmed: 35604118
Gigascience. 2022 Dec 28;12:
pubmed: 37036103
Syst Biol. 2008 Oct;57(5):758-71
pubmed: 18853362
PeerJ. 2019 Jul 26;7:e7359
pubmed: 31388474
Database (Oxford). 2020 Jan 1;2020:
pubmed: 32761142
Genome Res. 2020 Mar;30(3):315-333
pubmed: 32188701
Nat Commun. 2018 Nov 30;9(1):5114
pubmed: 30504855
Nucleic Acids Res. 2020 Jan 8;48(D1):D621-D625
pubmed: 31647096
Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5
pubmed: 17130148
Microb Genom. 2018 Sep;4(9):
pubmed: 30136922
Genome Biol. 2010;11(4):R37
pubmed: 20370897
PLoS One. 2018 Jul 25;13(7):e0200323
pubmed: 30044797
Front Microbiol. 2019 Jul 12;10:1612
pubmed: 31354692
BMC Bioinformatics. 2011 Dec 22;12:491
pubmed: 22192575
BMC Evol Biol. 2010 Jul 13;10:210
pubmed: 20626897
Gigascience. 2021 Jun 2;10(6):
pubmed: 34076241
Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43
pubmed: 22139910
mBio. 2020 Nov 24;11(6):
pubmed: 33234687
Nat Biotechnol. 2019 Aug;37(8):907-915
pubmed: 31375807
Nat Methods. 2014 Nov;11(11):1144-6
pubmed: 25218180
ISME J. 2017 Dec;11(12):2864-2868
pubmed: 28742071
Microb Genom. 2021 Nov;7(11):
pubmed: 34730487
Genome Biol. 2022 Feb 21;23(1):60
pubmed: 35189924
Bioinformatics. 2018 Sep 1;34(17):i884-i890
pubmed: 30423086
Mol Biol Evol. 2000 Jan;17(1):23-31
pubmed: 10666703
Genes (Basel). 2021 Oct 29;12(11):
pubmed: 34828348
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654
pubmed: 34320186
Int Microbiol. 2003 Jun;6(2):95-100
pubmed: 12748880
Nature. 2016 May 25;533(7604):452-4
pubmed: 27225100
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Bioinformatics. 2010 Oct 1;26(19):2460-1
pubmed: 20709691
BMC Bioinformatics. 2020 Jun 22;21(1):257
pubmed: 32571209
Front Microbiol. 2021 Oct 22;12:755101
pubmed: 34745061
Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6
pubmed: 23193283
DNA Res. 2003 Aug 31;10(4):137-45
pubmed: 14621292
Antonie Van Leeuwenhoek. 2020 Dec;113(12):2097-2106
pubmed: 33048250
Genome Res. 2003 Sep;13(9):2178-89
pubmed: 12952885
Nat Biotechnol. 2020 Sep;38(9):1079-1086
pubmed: 32341564
Nucleic Acids Res. 2021 Jan 8;49(D1):D389-D393
pubmed: 33196836
Nucleic Acids Res. 2022 Jan 7;50(D1):D161-D164
pubmed: 34850943
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Genome Res. 2008 Feb;18(2):298-309
pubmed: 18073381
Curr Biol. 2021 Jul 12;31(13):2857-2867.e4
pubmed: 33989529
mSystems. 2020 Aug 4;5(4):
pubmed: 32753501
Mol Phylogenet Evol. 2017 Oct;115:16-26
pubmed: 28716741
BMC Evol Biol. 2007 Feb 08;7 Suppl 1:S2
pubmed: 17288575
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Science. 2008 Jan 25;319(5862):473-6
pubmed: 18218900
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
Int J Syst Evol Microbiol. 2007 Jan;57(Pt 1):81-91
pubmed: 17220447
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
ISME J. 2020 Aug;14(8):2142-2152
pubmed: 32424249
BMC Bioinformatics. 2004 Aug 19;5:113
pubmed: 15318951
Genome Res. 2015 Jul;25(7):1043-55
pubmed: 25977477
BMC Bioinformatics. 2010 Mar 08;11:119
pubmed: 20211023
Semin Cancer Biol. 2019 Apr;55:53-60
pubmed: 29727703
Proc Natl Acad Sci U S A. 2009 Nov 10;106(45):19126-31
pubmed: 19855009
mSphere. 2021 Aug 25;6(4):e0006121
pubmed: 34287010
Nucleic Acids Res. 2016 Jan 4;44(D1):D67-72
pubmed: 26590407
Nucleic Acids Res. 2000 Jan 1;28(1):27-30
pubmed: 10592173
PLoS One. 2013 Jun 18;8(6):e66323
pubmed: 23823729
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Genome Biol. 2021 Jun 13;22(1):178
pubmed: 34120611
Genome Biol. 2019 Oct 28;20(1):224
pubmed: 31661016
Genome Biol. 2020 Sep 10;21(1):244
pubmed: 32912302
Int J Syst Evol Microbiol. 2010 Jan;60(Pt 1):249-266
pubmed: 19700448
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Genome Biol Evol. 2021 Dec 1;13(12):
pubmed: 34850891
Genome Biol. 2019 Nov 28;20(1):257
pubmed: 31779668
Appl Environ Microbiol. 2019 Oct 16;85(21):
pubmed: 31471301
Nat Methods. 2020 Nov;17(11):1103-1110
pubmed: 33020656
PeerJ. 2015 Oct 08;3:e1319
pubmed: 26500826