OGRE: Overlap Graph-based metagenomic Read clustEring.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
17 05 2021
17 05 2021
Historique:
received:
13
12
2019
revised:
19
08
2020
accepted:
25
08
2020
pubmed:
2
9
2020
medline:
9
6
2021
entrez:
2
9
2020
Statut:
ppublish
Résumé
The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Code is made available on Github (https://github.com/Marleen1/OGRE). Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 32871010
pii: 5900259
doi: 10.1093/bioinformatics/btaa760
pmc: PMC8128468
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
905-912Informations de copyright
© The Author(s) 2020. Published by Oxford University Press.
Références
Bioinformatics. 2011 Nov 1;27(21):2964-71
pubmed: 21926123
Bioinformatics. 2019 Nov 1;35(21):4281-4289
pubmed: 30994902
Bioinformatics. 2010 Jun 15;26(12):i367-73
pubmed: 20529929
BMC Bioinformatics. 2015 Feb 05;16:36
pubmed: 25652152
Nat Methods. 2017 Nov;14(11):1063-1071
pubmed: 28967888
Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4904-9
pubmed: 24632729
Bioinformatics. 2012 Sep 15;28(18):i356-i362
pubmed: 22962452
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Bioinformatics. 2016 Sep 1;32(17):i649-i657
pubmed: 27587685
J Comput Biol. 2011 Mar;18(3):523-34
pubmed: 21385052
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Genome Res. 2017 May;27(5):835-848
pubmed: 28396522
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Nucleic Acids Res. 2012 Nov 1;40(20):e155
pubmed: 22821567
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2016 Apr 1;32(7):1088-90
pubmed: 26614127