CAMISIM: simulating metagenomes and microbial communities.


Journal

Microbiome
ISSN: 2049-2618
Titre abrégé: Microbiome
Pays: England
ID NLM: 101615147

Informations de publication

Date de publication:
08 02 2019
Historique:
received: 18 04 2018
accepted: 21 01 2019
entrez: 10 2 2019
pubmed: 10 2 2019
medline: 6 5 2019
Statut: epublish

Résumé

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.

Sections du résumé

BACKGROUND
Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required.
RESULTS
We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM.
CONCLUSIONS
CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.

Identifiants

pubmed: 30736849
doi: 10.1186/s40168-019-0633-6
pii: 10.1186/s40168-019-0633-6
pmc: PMC6368784
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

17

Références

Bioinformatics. 2015 Oct 15;31(20):3350-2
pubmed: 26099265
Sci Data. 2016 Sep 27;3:160081
pubmed: 27673566
Nature. 2017 Nov 23;551(7681):457-463
pubmed: 29088705
Brief Bioinform. 2016 Jan;17(1):154-79
pubmed: 26026159
Genome Biol. 2017 Sep 21;18(1):181
pubmed: 28934976
Cell Rep. 2017 Oct 24;21(4):994-1008
pubmed: 29069606
Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62
pubmed: 26476454
PeerJ. 2016 Nov 8;4:e2676
pubmed: 27843713
ISME J. 2016 Jul;10(7):1589-601
pubmed: 26744812
Nat Biotechnol. 2013 Jun;31(6):533-8
pubmed: 23707974
PLoS One. 2008 Oct 08;3(10):e3373
pubmed: 18841204
Nat Biotechnol. 2015 Oct;33(10):1103-8
pubmed: 26414350
Nature. 2012 Jun 13;486(7402):207-14
pubmed: 22699609
Gigascience. 2012 Jul 12;1(1):7
pubmed: 23587224
Nat Methods. 2017 Nov;14(11):1063-1071
pubmed: 28967888
Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5
pubmed: 17130148
Bioinformatics. 2015 May 15;31(10):1674-6
pubmed: 25609793
Bioinformatics. 2013 Jan 1;29(1):119-21
pubmed: 23129296
Science. 2011 Jan 28;331(6016):463-7
pubmed: 21273488
Nat Commun. 2017 Oct 11;8(1):858
pubmed: 29021524
Nucleic Acids Res. 2015 Aug 18;43(14):6761-71
pubmed: 26150420
Proc Natl Acad Sci U S A. 2010 Aug 31;107(35):15345-50
pubmed: 20705897
BMC Bioinformatics. 2014;15 Suppl 9:S14
pubmed: 25253095
PLoS One. 2013 Oct 04;8(10):e75448
pubmed: 24124490
Gigascience. 2015 Oct 15;4:47
pubmed: 26473029
Nat Methods. 2014 Nov;11(11):1144-6
pubmed: 25218180
Nature. 2013 Jul 25;499(7459):431-7
pubmed: 23851394
Proc Natl Acad Sci U S A. 2002 Aug 6;99(16):10494-9
pubmed: 12097644
Nucleic Acids Res. 2012 Jul;40(12):e94
pubmed: 22434876
Nat Biotechnol. 2015 Oct;33(10):1053-60
pubmed: 26368049
Bioinformatics. 2016 Jul 15;32(14):2199-201
pubmed: 27153586
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
Gigascience. 2017 Apr 1;6(4):1-6
pubmed: 28327957
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
Methods Mol Biol. 2017;1588:255-277
pubmed: 28417375
Nat Methods. 2016 May;13(5):435-8
pubmed: 26999001
Bioinformatics. 2007 Jan 1;23(1):127-8
pubmed: 17050570
ISME J. 2016 Aug;10(8):2020-32
pubmed: 26859772
Nature. 2007 Nov 22;450(7169):560-5
pubmed: 18033299
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
Bioinformatics. 2012 Jun 1;28(11):1420-8
pubmed: 22495754
Biotechnol Biofuels. 2016 Jul 26;9:156
pubmed: 27462367
BMC Res Notes. 2014 Aug 15;7:533
pubmed: 25123167
mSystems. 2018 Jul 10;3(4):
pubmed: 30003143
Nature. 2007 Oct 18;449(7164):804-10
pubmed: 17943116
Nat Biotechnol. 2017 Sep 12;35(9):833-844
pubmed: 28898207
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
Genome Res. 2004 Jul;14(7):1394-403
pubmed: 15231754
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Mol Ecol. 2011 Jan;20(2):275-85
pubmed: 21155911
Nat Biotechnol. 2013 Sep;31(9):814-21
pubmed: 23975157
Nat Biotechnol. 2014 Aug;32(8):822-8
pubmed: 24997787
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Science. 2015 May 22;348(6237):1261359
pubmed: 25999513
Gigascience. 2015 Jul 30;4:33
pubmed: 26229594
PLoS One. 2012;7(2):e31386
pubmed: 22384016
PeerJ. 2014 Sep 30;2:e603
pubmed: 25289188
Cell Host Microbe. 2014 Sep 10;16(3):276-89
pubmed: 25211071
PeerJ. 2015 Aug 27;3:e1165
pubmed: 26336640
Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2567-72
pubmed: 15701695

Auteurs

Adrian Fritz (A)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.

Peter Hofmann (P)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225, Germany.

Stephan Majda (S)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225, Germany.

Eik Dahms (E)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225, Germany.

Johannes Dröge (J)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225, Germany.

Jessika Fiedler (J)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225, Germany.

Till R Lesker (TR)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, 38124, Germany.

Peter Belmann (P)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
Center for Biotechnology and Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany.

Matthew Z DeMaere (MZ)

The ithree institute, University of Technology Sydney, Sydney NSW, 2007, Australia.

Aaron E Darling (AE)

The ithree institute, University of Technology Sydney, Sydney NSW, 2007, Australia.

Alexander Sczyrba (A)

Center for Biotechnology and Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany.

Andreas Bremges (A)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany.
German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, 38124, Germany.

Alice C McHardy (AC)

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124, Germany. alice.mchardy@helmholtz-hzi.de.
Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225, Germany. alice.mchardy@helmholtz-hzi.de.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH