metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data.
Common Workflow Language (CWL)
MGnify
RO-Crate
containers
provenance
shotgun metagenomics
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
28 12 2022
28 12 2022
Historique:
received:
10
05
2023
revised:
30
06
2023
accepted:
11
09
2023
medline:
23
10
2023
pubmed:
18
10
2023
entrez:
18
10
2023
Statut:
ppublish
Résumé
Genomic Observatories (GOs) are sites of long-term scientific study that undertake regular assessments of the genomic biodiversity. The European Marine Omics Biodiversity Observation Network (EMO BON) is a network of GOs that conduct regular biological community samplings to generate environmental and metagenomic data of microbial communities from designated marine stations around Europe. The development of an effective workflow is essential for the analysis of the EMO BON metagenomic data in a timely and reproducible manner. Based on the established MGnify resource, we developed metaGOflow. metaGOflow supports the fast inference of taxonomic profiles from GO-derived data based on ribosomal RNA genes and their functional annotation using the raw reads. Thanks to the Research Object Crate packaging, relevant metadata about the sample under study, and the details of the bioinformatics analysis it has been subjected to, are inherited to the data product while its modular implementation allows running the workflow partially. The analysis of 2 EMO BON samples and 1 Tara Oceans sample was performed as a use case. metaGOflow is an efficient and robust workflow that scales to the needs of projects producing big metagenomic data such as EMO BON. It highlights how containerization technologies along with modern workflow languages and metadata package approaches can support the needs of researchers when dealing with ever-increasing volumes of biological data. Despite being initially oriented to address the needs of EMO BON, metaGOflow is a flexible and easy-to-use workflow that can be broadly used for one-sample-at-a-time analysis of shotgun metagenomics data.
Sections du résumé
BACKGROUND
Genomic Observatories (GOs) are sites of long-term scientific study that undertake regular assessments of the genomic biodiversity. The European Marine Omics Biodiversity Observation Network (EMO BON) is a network of GOs that conduct regular biological community samplings to generate environmental and metagenomic data of microbial communities from designated marine stations around Europe. The development of an effective workflow is essential for the analysis of the EMO BON metagenomic data in a timely and reproducible manner.
FINDINGS
Based on the established MGnify resource, we developed metaGOflow. metaGOflow supports the fast inference of taxonomic profiles from GO-derived data based on ribosomal RNA genes and their functional annotation using the raw reads. Thanks to the Research Object Crate packaging, relevant metadata about the sample under study, and the details of the bioinformatics analysis it has been subjected to, are inherited to the data product while its modular implementation allows running the workflow partially. The analysis of 2 EMO BON samples and 1 Tara Oceans sample was performed as a use case.
CONCLUSIONS
metaGOflow is an efficient and robust workflow that scales to the needs of projects producing big metagenomic data such as EMO BON. It highlights how containerization technologies along with modern workflow languages and metadata package approaches can support the needs of researchers when dealing with ever-increasing volumes of biological data. Despite being initially oriented to address the needs of EMO BON, metaGOflow is a flexible and easy-to-use workflow that can be broadly used for one-sample-at-a-time analysis of shotgun metagenomics data.
Identifiants
pubmed: 37850871
pii: 7321054
doi: 10.1093/gigascience/giad078
pmc: PMC10583283
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2023. Published by Oxford University Press GigaScience.
Références
Nat Rev Microbiol. 2020 Aug;18(8):428-445
pubmed: 32398798
Bioinformatics. 2014 May 1;30(9):1236-40
pubmed: 24451626
Front Plant Sci. 2014 Jun 16;5:209
pubmed: 24982662
Nat Biotechnol. 2017 Apr 11;35(4):314-316
pubmed: 28398314
Gigascience. 2022 Dec 28;12:
pubmed: 37850871
Methods Mol Biol. 2016;1399:207-33
pubmed: 26791506
Nucleic Acids Res. 2013 Jan;41(Database issue):D387-95
pubmed: 23197656
Nucleic Acids Res. 2019 Jan 8;47(D1):D666-D677
pubmed: 30289528
OMICS. 2008 Jun;12(2):115-21
pubmed: 18479204
Nucleic Acids Res. 2019 Jan 8;47(D1):D309-D314
pubmed: 30418610
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432
pubmed: 30357350
J Antibiot (Tokyo). 2010 Aug;63(8):415-22
pubmed: 20606699
Crit Rev Microbiol. 2016 Nov;42(6):883-904
pubmed: 26585708
Microorganisms. 2022 Jan 26;10(2):
pubmed: 35208748
Front Microbiol. 2020 Oct 23;11:550420
pubmed: 33193131
Elife. 2021 May 04;10:
pubmed: 33944776
Mol Biol Evol. 2021 Dec 9;38(12):5825-5829
pubmed: 34597405
Nat Commun. 2019 Mar 4;10(1):1014
pubmed: 30833550
Bioinformatics. 2015 May 15;31(10):1674-6
pubmed: 25609793
PLoS Comput Biol. 2012;8(9):e1002687
pubmed: 23028285
Microbiome. 2019 Jul 15;7(1):104
pubmed: 31307536
Nucleic Acids Res. 2020 Jan 8;48(D1):D570-D578
pubmed: 31696235
PLoS Comput Biol. 2012;8(7):e1002606
pubmed: 22807668
Nat Biotechnol. 2020 Mar;38(3):276-278
pubmed: 32055031
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Bioinformatics. 2018 Sep 1;34(17):i884-i890
pubmed: 30423086
BMC Genomics. 2019 Dec 10;20(1):960
pubmed: 31823721
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Nucleic Acids Res. 2013 Jan;41(Database issue):D344-7
pubmed: 23161676
Nucleic Acids Res. 2019 Jan 8;47(D1):D351-D360
pubmed: 30398656
Nucleic Acids Res. 2023 Jan 6;51(D1):D121-D125
pubmed: 36399492
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
Protein Cell. 2021 May;12(5):315-330
pubmed: 32394199
Ann Rev Mar Sci. 2012;4:11-37
pubmed: 22457967
Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62
pubmed: 26476454
BMC Bioinformatics. 2017 Mar 20;18(1):177
pubmed: 28320317
PLoS Comput Biol. 2011 Oct;7(10):e1002195
pubmed: 22039361
BMC Bioinformatics. 2011 Sep 30;12:385
pubmed: 21961884
Science. 2016 Sep 16;353(6305):1272-7
pubmed: 27634532
Nat Biotechnol. 2017 Sep 12;35(9):833-844
pubmed: 28898207
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Nucleic Acids Res. 2010 Nov;38(20):e191
pubmed: 20805240
ISME J. 2016 Dec;10(12):2946-2957
pubmed: 27137127
F1000Res. 2021 Jan 18;10:33
pubmed: 34035898
Bioinformatics. 2017 Dec 01;33(23):3808-3810
pubmed: 28961926
PLoS One. 2017 Jan 18;12(1):e0169662
pubmed: 28099457
Nucleic Acids Res. 2022 Jan 7;50(D1):D741-D746
pubmed: 34718743
Nat Methods. 2022 Apr;19(4):429-440
pubmed: 35396482
Brief Bioinform. 2021 Jan 18;22(1):178-193
pubmed: 31848574
Sci Data. 2015 May 26;2:150023
pubmed: 26029378
Gigascience. 2021 Aug 18;10(8):
pubmed: 34405237
Gigascience. 2015 Jun 19;4:27
pubmed: 26097697
Science. 2015 May 22;348(6237):1261359
pubmed: 25999513
Science. 2022 Apr 8;376(6589):156-162
pubmed: 35389782
NAR Genom Bioinform. 2022 Feb 02;4(1):lqac007
pubmed: 35118380
Bioinformatics. 2013 Nov 15;29(22):2933-5
pubmed: 24008419
Microbiome. 2018 Sep 15;6(1):158
pubmed: 30219103
Bioinformatics. 2020 Apr 1;36(7):2251-2252
pubmed: 31742321