OGRE: Overlap Graph-based metagenomic Read clustEring.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
17 05 2021
Historique:
received: 13 12 2019
revised: 19 08 2020
accepted: 25 08 2020
pubmed: 2 9 2020
medline: 9 6 2021
entrez: 2 9 2020
Statut: ppublish

Résumé

The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Code is made available on Github (https://github.com/Marleen1/OGRE). Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 32871010
pii: 5900259
doi: 10.1093/bioinformatics/btaa760
pmc: PMC8128468
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

905-912

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Bioinformatics. 2011 Nov 1;27(21):2964-71
pubmed: 21926123
Bioinformatics. 2019 Nov 1;35(21):4281-4289
pubmed: 30994902
Bioinformatics. 2010 Jun 15;26(12):i367-73
pubmed: 20529929
BMC Bioinformatics. 2015 Feb 05;16:36
pubmed: 25652152
Nat Methods. 2017 Nov;14(11):1063-1071
pubmed: 28967888
Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4904-9
pubmed: 24632729
Bioinformatics. 2012 Sep 15;28(18):i356-i362
pubmed: 22962452
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Bioinformatics. 2016 Sep 1;32(17):i649-i657
pubmed: 27587685
J Comput Biol. 2011 Mar;18(3):523-34
pubmed: 21385052
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Genome Res. 2017 May;27(5):835-848
pubmed: 28396522
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Nucleic Acids Res. 2012 Nov 1;40(20):e155
pubmed: 22821567
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2016 Apr 1;32(7):1088-90
pubmed: 26614127

Auteurs

Marleen Balvert (M)

Life Sciences & Health, Centrum Wiskunde & Informatica, Amsterdam 1098 XG, The Netherlands.
Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands.
Department of Econometrics & Operations Research, Tilburg University, Tilburg 5000 LE, The Netherlands.

Xiao Luo (X)

Life Sciences & Health, Centrum Wiskunde & Informatica, Amsterdam 1098 XG, The Netherlands.

Ernestina Hauptfeld (E)

Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands.
Laboratorium of Microbiology, Wageningen University & Research, Wageningen 6700 HB, The Netherlands.

Alexander Schönhuth (A)

Life Sciences & Health, Centrum Wiskunde & Informatica, Amsterdam 1098 XG, The Netherlands.
Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands.

Bas E Dutilh (BE)

Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH