ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
12 Mar 2021
Historique:
received: 01 10 2019
accepted: 21 02 2021
entrez: 12 3 2021
pubmed: 13 3 2021
medline: 27 3 2021
Statut: epublish

Résumé

Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs. To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets. A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.

Sections du résumé

BACKGROUND BACKGROUND
Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs.
RESULTS RESULTS
To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets.
CONCLUSIONS CONCLUSIONS
A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.

Identifiants

pubmed: 33706720
doi: 10.1186/s12859-021-04038-2
pii: 10.1186/s12859-021-04038-2
pmc: PMC7953547
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

119

Subventions

Organisme : National Heart, Lung, and Blood Institute
ID : R01-HL-105770

Références

Microbiome. 2019 Mar 22;7(1):46
pubmed: 30902113
Virus Genes. 2015 Aug;51(1):132-5
pubmed: 26013257
J Virol. 2017 Aug 24;91(18):
pubmed: 28659484
PeerJ. 2013 Oct 31;1:e196
pubmed: 24281688
BMC Genomics. 2012 Sep 13;13:475
pubmed: 22974120
Virology. 2018 Oct;523:74-88
pubmed: 30098450
Nat Commun. 2018 Nov 14;9(1):4852
pubmed: 30429475
Front Microbiol. 2016 Mar 04;7:269
pubmed: 26973638
Genome Res. 2014 Jul;24(7):1180-92
pubmed: 24899342
Brief Bioinform. 2020 Mar 23;21(2):584-594
pubmed: 30815668
BMC Bioinformatics. 2017 Apr 26;18(1):223
pubmed: 28446139
PLoS Pathog. 2013 Feb;9(2):e1003146
pubmed: 23457428
Proc Natl Acad Sci U S A. 2014 Nov 25;111(47):16842-7
pubmed: 25349412
Microbiome. 2015 Aug 05;3:32
pubmed: 26246894
BMC Bioinformatics. 2008 Sep 19;9:386
pubmed: 18803844
Vox Sang. 2018 Jul 18;:
pubmed: 30022500
J Virol. 2015 Aug;89(16):8152-61
pubmed: 26018153
Virology. 2016 Sep;496:299-305
pubmed: 27393975
Methods Mol Biol. 2012;856:415-29
pubmed: 22399469
DNA Res. 2015 Feb;22(1):69-77
pubmed: 25431440
Nat Rev Microbiol. 2017 Mar;15(3):183-192
pubmed: 28090077
Bioinformatics. 2016 Sep 1;32(17):i649-i657
pubmed: 27587685
Methods. 2016 Jun 1;102:3-11
pubmed: 27012178
Nucleic Acids Res. 2015 Apr 20;43(7):e46
pubmed: 25586223
Bioinformatics. 2017 Jun 15;33(12):1782-1788
pubmed: 28186221
Curr Opin Microbiol. 2013 Aug;16(4):468-78
pubmed: 23725672
Science. 2018 Feb 23;359(6378):872-874
pubmed: 29472471
J Virol Methods. 2015 Mar;213:139-46
pubmed: 25497414
J Virol. 2019 Aug 13;93(17):
pubmed: 31189707
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
PLoS Pathog. 2012 Sep;8(9):e1002924
pubmed: 23028323
Vet Res. 2019 May 16;50(1):35
pubmed: 31097029
Nature. 2016 Aug 25;536(7617):425-30
pubmed: 27533034
Bioinformatics. 2015 Jul 15;31(14):2374-6
pubmed: 25725497
BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):408
pubmed: 29072142
Bioinformatics. 2012 Jun 1;28(11):1420-8
pubmed: 22495754
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
G3 (Bethesda). 2013 May 20;3(5):865-80
pubmed: 23550143
Bioinformatics. 2012 Jun 1;28(11):1533-5
pubmed: 22508794
Genome Biol. 2012 Dec 22;13(12):R122
pubmed: 23259615
Biologicals. 2017 Mar;46:64-67
pubmed: 28100412
Genome Res. 2017 May;27(5):835-848
pubmed: 28396522
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Nucleic Acids Res. 2012 Nov 1;40(20):e155
pubmed: 22821567
Zebrafish. 2019 Jun;16(3):291-299
pubmed: 30939077
Nat Rev Genet. 2019 Jun;20(6):341-355
pubmed: 30918369
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W20-5
pubmed: 15215342
Bioinformatics. 2011 Mar 15;27(6):863-4
pubmed: 21278185

Auteurs

Zachary Deng (Z)

Vitalant Research Institute, San Francisco, CA, 94118, USA. dengzac@gmail.com.
Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA, 94107, USA. dengzac@gmail.com.

Eric Delwart (E)

Vitalant Research Institute, San Francisco, CA, 94118, USA. delwarte@medicine.ucsf.edu.
Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA, 94107, USA. delwarte@medicine.ucsf.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH