De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.

Animals Arabidopsis Contig Mapping / methods Escherichia coli Humans Mice Sequence Analysis, RNA / methods Software Transcriptome

RNA-Seq assembly comparison de novo transcriptomics

Journal

GigaScience

ISSN: 2047-217X

Titre abrégé: Gigascience

Pays: United States

ID NLM: 101596872

Informations de publication

Date de publication:
01 05 2019

Historique:

received: 14 08 2018

revised: 21 12 2018

accepted: 09 03 2019

entrez: 12 5 2019

pubmed: 12 5 2019

medline: 24 12 2019

Statut: ppublish

Résumé

In recent years, massively parallel complementary DNA sequencing (RNA sequencing [RNA-Seq]) has emerged as a fast, cost-effective, and robust technology to study entire transcriptomes in various manners. In particular, for non-model organisms and in the absence of an appropriate reference genome, RNA-Seq is used to reconstruct the transcriptome de novo. Although the de novo transcriptome assembly of non-model organisms has been on the rise recently and new tools are frequently developing, there is still a knowledge gap about which assembly software should be used to build a comprehensive de novo assembly. Here, we present a large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life. Overall, we built >200 single assemblies and evaluated their performance on a combination of 20 biological-based and reference-free metrics. Our study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions, and evaluation results. Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared. Moreover, we observed species-specific differences in the performance of each assembler. No tool delivered the best results for all data sets. We recommend a careful choice and normalization of evaluation metrics to select the best assembling results as a critical step in the reconstruction of a comprehensive de novo transcriptome assembly.

Sections du résumé

BACKGROUND

RESULTS

Here, we present a large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life. Overall, we built >200 single assemblies and evaluated their performance on a combination of 20 biological-based and reference-free metrics. Our study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions, and evaluation results. Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared. Moreover, we observed species-specific differences in the performance of each assembler. No tool delivered the best results for all data sets.

CONCLUSIONS

We recommend a careful choice and normalization of evaluation metrics to select the best assembling results as a critical step in the reconstruction of a comprehensive de novo transcriptome assembly.

Identifiants

DOI: 10.1093/gigascience/giz039 PMID: 31077315 PMC: PMC6511074

pubmed: 31077315

pii: 5488105

doi: 10.1093/gigascience/giz039

pmc: PMC6511074

pii:

doi:

Types de publication

Comparative Study Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Genome Res. 2016 Aug;26(8):1134-44

pubmed: 27252236

Bioinformatics. 2015 Oct 1;31(19):3210-2

pubmed: 26059717

Arch Virol Suppl. 1993;7:81-100

pubmed: 8219816

Genome Biol. 2015 Feb 11;16:30

pubmed: 25723335

Genome Res. 2004 Jun;14(6):1147-59

pubmed: 15140833

PLoS One. 2016 Apr 07;11(4):e0153104

pubmed: 27054874

Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12

pubmed: 25348405

Mol Biol Evol. 2018 Mar 1;35(3):543-548

pubmed: 29220515

Genome Biol. 2014 Dec 21;15(12):553

pubmed: 25608678

Nat Methods. 2010 Nov;7(11):909-12

pubmed: 20935650

Gigascience. 2019 Sep 1;8(9):

pubmed: 31494669

Sci Rep. 2016 Oct 07;6:34589

pubmed: 27713552

Front Genet. 2014 Jun 25;5:190

pubmed: 25009556

PLoS One. 2014 Dec 31;9(12):e115055

pubmed: 25551607

BMC Bioinformatics. 2011 Dec 14;12 Suppl 14:S2

pubmed: 22373417

Nat Commun. 2017 Jul 5;8(1):59

pubmed: 28680106

J Bacteriol. 2015 Jan 1;197(1):18-28

pubmed: 25266388

Nat Biotechnol. 2010 May;28(5):511-5

pubmed: 20436464

Gigascience. 2019 May 1;8(5):

pubmed: 31077315

Gigascience. 2012 Dec 27;1(1):18

pubmed: 23587118

Nat Biotechnol. 2011 May 15;29(7):644-52

pubmed: 21572440

Algorithms Mol Biol. 2017 Feb 22;12:2

pubmed: 28250805

Sci China Life Sci. 2011 Dec;54(12):1129-33

pubmed: 22227905

G3 (Bethesda). 2015 Jan 29;5(4):497-505

pubmed: 25636313

BMC Genomics. 2010 Oct 16;11:571

pubmed: 20950480

Sci China Life Sci. 2013 Feb;56(2):143-55

pubmed: 23393030

Sci China Life Sci. 2013 Feb;56(2):156-62

pubmed: 23393031

Genome Res. 2008 May;18(5):821-9

pubmed: 18349386

Nat Rev Genet. 2009 Jan;10(1):57-63

pubmed: 19015660

Nat Biotechnol. 2010 May;28(5):421-3

pubmed: 20458303

Bioinformatics. 2017 Feb 1;33(3):327-333

pubmed: 28172640

Bioinformatics. 2013 Jul 01;29(13):i326-34

pubmed: 23813001

Bioinformatics. 2012 Apr 15;28(8):1086-92

pubmed: 22368243

J Comput Biol. 2012 May;19(5):455-77

pubmed: 22506599

Nat Methods. 2015 Apr;12(4):357-60

pubmed: 25751142

Nucleic Acids Res. 2012 Jan;40(Database issue):D84-90

pubmed: 22086963

Wiley Interdiscip Rev RNA. 2017 Jan;8(1):

pubmed: 27198714

Nat Rev Genet. 2011 Sep 07;12(10):671-82

pubmed: 21897427

Nat Commun. 2014;5:3064

pubmed: 24451981

Genome Biol. 2016 Jan 26;17:13

pubmed: 26813401

Bioinformatics. 2011 Mar 15;27(6):863-4

pubmed: 21278185

PLoS Comput Biol. 2016 Feb 19;12(2):e1004772

pubmed: 26894997

Bioinformatics. 2019 May 1;35(9):1613-1614

pubmed: 30247621

Nat Methods. 2017 Apr;14(4):417-419

pubmed: 28263959

Genome Res. 2009 Jun;19(6):1117-23

pubmed: 19251739

Nucleic Acids Res. 2012 Nov 1;40(20):10073-83

pubmed: 22962361

Bioinformatics. 2014 Jun 15;30(12):1660-6

pubmed: 24532719

Bioinformatics. 2016 Jul 15;32(14):2210-2

pubmed: 27153654

J Mol Biol. 1990 Oct 5;215(3):403-10

pubmed: 2231712

De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Martin Hölzer (M)

Manja Marz (M)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH