BIOCOM-PIPE: a new user-friendly metabarcoding pipeline for the characterization of microbial diversity from 16S, 18S and 23S rRNA gene amplicons.
Archaea
/ genetics
Bacteria
/ genetics
Biodiversity
Cluster Analysis
Computational Biology
/ methods
Computer Simulation
DNA Barcoding, Taxonomic
Databases, Genetic
Fungi
/ genetics
Genes, rRNA
Microbiota
/ genetics
RNA, Ribosomal, 16S
/ genetics
RNA, Ribosomal, 23S
/ genetics
Software
Soil Microbiology
Archaeal
Bacterial
Ecology
France
Fungal
Land-use
Metabarcoding
Photosynthetic microeukaryotes
ReClustOR
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
31 Oct 2020
31 Oct 2020
Historique:
received:
21
02
2020
accepted:
21
10
2020
entrez:
1
11
2020
pubmed:
2
11
2020
medline:
20
11
2020
Statut:
epublish
Résumé
The ability to compare samples or studies easily using metabarcoding so as to better interpret microbial ecology results is an upcoming challenge. A growing number of metabarcoding pipelines are available, each with its own benefits and limitations. However, very few have been developed to offer the opportunity to characterize various microbial communities (e.g., archaea, bacteria, fungi, photosynthetic microeukaryotes) with the same tool. BIOCOM-PIPE is a flexible and independent suite of tools for processing data from high-throughput sequencing technologies, Roche 454 and Illumina platforms, and focused on the diversity of archaeal, bacterial, fungal, and photosynthetic microeukaryote amplicons. Various original methods were implemented in BIOCOM-PIPE to (1) remove chimeras based on read abundance, (2) align sequences with structure-based alignments of RNA homologs using covariance models, and (3) a post-clustering tool (ReClustOR) to improve OTUs consistency based on a reference OTU database. The comparison with two other pipelines (FROGS and mothur) and Amplicon Sequence Variant definition highlighted that BIOCOM-PIPE was better at discriminating land use groups. The BIOCOM-PIPE pipeline makes it possible to analyze 16S, 18S and 23S rRNA genes in the same packaged tool. The new post-clustering approach defines a biological database from previously analyzed samples and performs post-clustering of reads with this reference database by using open-reference clustering. This makes it easier to compare projects from various sequencing runs, and increased the congruence among results. For all users, the pipeline was developed to allow for adding or modifying the components, the databases and the bioinformatics tools easily, giving high modularity for each analysis.
Sections du résumé
BACKGROUND
BACKGROUND
The ability to compare samples or studies easily using metabarcoding so as to better interpret microbial ecology results is an upcoming challenge. A growing number of metabarcoding pipelines are available, each with its own benefits and limitations. However, very few have been developed to offer the opportunity to characterize various microbial communities (e.g., archaea, bacteria, fungi, photosynthetic microeukaryotes) with the same tool.
RESULTS
RESULTS
BIOCOM-PIPE is a flexible and independent suite of tools for processing data from high-throughput sequencing technologies, Roche 454 and Illumina platforms, and focused on the diversity of archaeal, bacterial, fungal, and photosynthetic microeukaryote amplicons. Various original methods were implemented in BIOCOM-PIPE to (1) remove chimeras based on read abundance, (2) align sequences with structure-based alignments of RNA homologs using covariance models, and (3) a post-clustering tool (ReClustOR) to improve OTUs consistency based on a reference OTU database. The comparison with two other pipelines (FROGS and mothur) and Amplicon Sequence Variant definition highlighted that BIOCOM-PIPE was better at discriminating land use groups.
CONCLUSIONS
CONCLUSIONS
The BIOCOM-PIPE pipeline makes it possible to analyze 16S, 18S and 23S rRNA genes in the same packaged tool. The new post-clustering approach defines a biological database from previously analyzed samples and performs post-clustering of reads with this reference database by using open-reference clustering. This makes it easier to compare projects from various sequencing runs, and increased the congruence among results. For all users, the pipeline was developed to allow for adding or modifying the components, the databases and the bioinformatics tools easily, giving high modularity for each analysis.
Identifiants
pubmed: 33129268
doi: 10.1186/s12859-020-03829-3
pii: 10.1186/s12859-020-03829-3
pmc: PMC7603665
doi:
Substances chimiques
RNA, Ribosomal, 16S
0
RNA, Ribosomal, 23S
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
492Références
Nat Biotechnol. 2016 Sep;34(9):942-9
pubmed: 27454739
mBio. 2018 Jun 5;9(3):
pubmed: 29871915
Appl Environ Microbiol. 2009 Dec;75(23):7537-41
pubmed: 19801464
Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6
pubmed: 23193283
Mol Ecol. 2018 Jan;27(2):313-338
pubmed: 29292539
PLoS One. 2011;6(12):e27310
pubmed: 22194782
Microbiome. 2015 Oct 05;3:43
pubmed: 26434730
Bioinformatics. 2013 Nov 15;29(22):2933-5
pubmed: 24008419
Appl Environ Microbiol. 2005 Mar;71(3):1501-6
pubmed: 15746353
J Microbiol Methods. 2018 Oct;153:139-147
pubmed: 30267718
Mol Ecol. 2019 Apr;28(8):1857-1862
pubmed: 31033079
Nature. 2017 Nov 23;551(7681):457-463
pubmed: 29088705
Environ Microbiol. 2019 Jul;21(7):2440-2468
pubmed: 30990927
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
pubmed: 12136088
Nucleic Acids Res. 2015 Jan;43(Database issue):D593-8
pubmed: 25414355
Nat Methods. 2016 Jul;13(7):581-3
pubmed: 27214047
Mol Ecol Resour. 2017 Nov;17(6):e122-e132
pubmed: 28695665
Environ Microbiol Rep. 2009 Apr;1(2):97-9
pubmed: 23765739
Ecol Lett. 2006 Jun;9(6):683-93
pubmed: 16706913
Nat Commun. 2017 Oct 30;8(1):1188
pubmed: 29084957
BMC Bioinformatics. 2019 Jul 3;20(1):374
pubmed: 31269897
Genome Res. 2011 Mar;21(3):494-504
pubmed: 21212162
ISME J. 2012 Dec;6(12):2199-218
pubmed: 22855212
Appl Environ Microbiol. 2006 Jul;72(7):5069-72
pubmed: 16820507
Sci Rep. 2020 Apr 3;10(1):5915
pubmed: 32246067
Appl Environ Microbiol. 2007 Aug;73(16):5261-7
pubmed: 17586664
Appl Environ Microbiol. 2015 Mar;81(5):1573-84
pubmed: 25527546
Microbiome. 2018 Feb 26;6(1):41
pubmed: 29482646
Microorganisms. 2019 Sep 26;7(10):
pubmed: 31561435
Ecology. 2012 Dec;93(12):2533-47
pubmed: 23431585
Bioinformatics. 2010 Oct 1;26(19):2460-1
pubmed: 20709691
Gigascience. 2018 May 1;7(5):
pubmed: 29762668
Bioinformatics. 2011 Nov 1;27(21):2957-63
pubmed: 21903629
BMC Bioinformatics. 2011 Jan 28;12:38
pubmed: 21276213
PLoS Comput Biol. 2012;8(10):e1002743
pubmed: 23133348
Nucleic Acids Res. 2012 May;40(9):e66
pubmed: 22278883
PeerJ. 2015 Dec 08;3:e1487
pubmed: 26664811
PeerJ. 2016 Oct 18;4:e2584
pubmed: 27781170
PLoS One. 2016 Feb 05;11(2):e0148028
pubmed: 26849217
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D294-6
pubmed: 15608200
FEMS Microbiol Ecol. 2017 May 1;93(5):
pubmed: 28379446
Nucleic Acids Res. 2019 Jan 8;47(D1):D649-D659
pubmed: 30357420
Microbiome. 2017 Aug 14;5(1):100
pubmed: 28807046
Appl Environ Microbiol. 2005 Dec;71(12):8228-35
pubmed: 16332807
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
mSphere. 2017 Mar 8;2(2):
pubmed: 28289728
PLoS One. 2017 Dec 18;12(12):e0190128
pubmed: 29253898
Appl Environ Microbiol. 2018 Mar 19;84(7):
pubmed: 29427429
BMC Bioinformatics. 2017 May 30;18(1):283
pubmed: 28558684
Bioinformatics. 2011 Mar 15;27(6):863-4
pubmed: 21278185
Bioinformatics. 2018 Apr 15;34(8):1287-1294
pubmed: 29228191
Nat Methods. 2010 May;7(5):335-6
pubmed: 20383131
Science. 2015 May 22;348(6237):1261359
pubmed: 25999513
Bioinformatics. 2011 Aug 15;27(16):2194-200
pubmed: 21700674
FEMS Microbiol Rev. 2008 Jul;32(4):557-78
pubmed: 18435746
ISME J. 2013 Mar;7(3):457-60
pubmed: 23018771
Trends Microbiol. 2018 Sep;26(9):738-747
pubmed: 29550356