Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 07 2020
Historique:
received: 16 10 2019
revised: 11 04 2020
accepted: 17 04 2020
pubmed: 25 4 2020
medline: 29 12 2020
entrez: 25 4 2020
Statut: ppublish

Résumé

The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases demands for approaches that rapidly 'explore' the content of multiple and/or large metagenomic datasets with respect to specific domain targets, avoiding full domain annotation and full assembly. S3A is a fast and accurate domain-targeted assembler designed for a rapid functional profiling. It is based on a novel construction and a fast traversal of the Overlap-Layout-Consensus graph, designed to reconstruct coding regions from domain annotated metagenomic sequence reads. S3A relies on high-quality domain annotation to efficiently assemble metagenomic sequences and on the design of a new confidence measure for a fast evaluation of overlapping reads. Its implementation is highly generic and can be applied to any arbitrary type of annotation. On simulated data, S3A achieves a level of accuracy similar to that of classical metagenomics assembly tools while permitting to conduct a faster and sensitive profiling on domains of interest. When studying a few dozens of functional domains-a typical scenario-S3A is up to an order of magnitude faster than general purpose metagenomic assemblers, thus enabling the analysis of a larger number of datasets in the same amount of time. S3A opens new avenues to the fast exploration of the rapidly increasing number of metagenomic datasets displaying an ever-increasing size. S3A is available at http://www.lcqb.upmc.fr/S3A_ASSEMBLER/. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 32330240
pii: 5824791
doi: 10.1093/bioinformatics/btaa272
pmc: PMC7332565
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

3975-3981

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Nucleic Acids Res. 2017 Jan 4;45(D1):D566-D573
pubmed: 27789705
Microbiome. 2015 Aug 05;3:32
pubmed: 26246894
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37
pubmed: 21593126
Methods Mol Biol. 2016;1399:207-33
pubmed: 26791506
Genome Res. 2008 May;18(5):821-9
pubmed: 18349386
Nat Rev Microbiol. 2005 Jun;3(6):489-98
pubmed: 15931167
Nucleic Acids Res. 2010 Nov;38(20):e191
pubmed: 20805240
PLoS Comput Biol. 2014 Aug 14;10(8):e1003737
pubmed: 25122209
Genome Biol. 2016 Jan 18;17:9
pubmed: 26781712
mSystems. 2017 Dec 5;2(6):
pubmed: 29238752
Bioinformatics. 2016 Jun 15;32(12):i201-i208
pubmed: 27307618
PLoS One. 2008 Oct 08;3(10):e3373
pubmed: 18841204
Nat Methods. 2017 Nov;14(11):1063-1071
pubmed: 28967888
Nature. 2017 Mar 1;543(7643):51-59
pubmed: 28252066
Microbiome. 2018 Aug 28;6(1):149
pubmed: 30153857
Bioinformatics. 2010 Sep 15;26(18):i420-5
pubmed: 20823302
PLoS Biol. 2007 Mar;5(3):e82
pubmed: 17355177
Algorithms Mol Biol. 2013 Sep 16;8(1):22
pubmed: 24040893
Front Genet. 2015 Dec 17;6:348
pubmed: 26734060
Nucleic Acids Res. 2016 Jan 4;44(D1):D590-4
pubmed: 26656948
Front Microbiol. 2016 Jul 18;7:1040
pubmed: 27486436
Brief Bioinform. 2012 Nov;13(6):696-710
pubmed: 23175748

Auteurs

Laurent David (L)

Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238.

Riccardo Vicedomini (R)

Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238.
Sorbonne Université, CNRS, Institut des Sciences du Calcul et des Données (ISCD).

Hugues Richard (H)

Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238.
Bioinformatics Unit (MF1), Robert Koch Institute, Berlin 13353, Germany.

Alessandra Carbone (A)

Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), UMR 7238.
Institut Universitaire de France, Paris 75005, France.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH