De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.
RNA-Seq
assembly
comparison
de novo
transcriptomics
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
01 05 2019
01 05 2019
Historique:
received:
14
08
2018
revised:
21
12
2018
accepted:
09
03
2019
entrez:
12
5
2019
pubmed:
12
5
2019
medline:
24
12
2019
Statut:
ppublish
Résumé
In recent years, massively parallel complementary DNA sequencing (RNA sequencing [RNA-Seq]) has emerged as a fast, cost-effective, and robust technology to study entire transcriptomes in various manners. In particular, for non-model organisms and in the absence of an appropriate reference genome, RNA-Seq is used to reconstruct the transcriptome de novo. Although the de novo transcriptome assembly of non-model organisms has been on the rise recently and new tools are frequently developing, there is still a knowledge gap about which assembly software should be used to build a comprehensive de novo assembly. Here, we present a large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life. Overall, we built >200 single assemblies and evaluated their performance on a combination of 20 biological-based and reference-free metrics. Our study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions, and evaluation results. Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared. Moreover, we observed species-specific differences in the performance of each assembler. No tool delivered the best results for all data sets. We recommend a careful choice and normalization of evaluation metrics to select the best assembling results as a critical step in the reconstruction of a comprehensive de novo transcriptome assembly.
Sections du résumé
BACKGROUND
In recent years, massively parallel complementary DNA sequencing (RNA sequencing [RNA-Seq]) has emerged as a fast, cost-effective, and robust technology to study entire transcriptomes in various manners. In particular, for non-model organisms and in the absence of an appropriate reference genome, RNA-Seq is used to reconstruct the transcriptome de novo. Although the de novo transcriptome assembly of non-model organisms has been on the rise recently and new tools are frequently developing, there is still a knowledge gap about which assembly software should be used to build a comprehensive de novo assembly.
RESULTS
Here, we present a large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life. Overall, we built >200 single assemblies and evaluated their performance on a combination of 20 biological-based and reference-free metrics. Our study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions, and evaluation results. Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared. Moreover, we observed species-specific differences in the performance of each assembler. No tool delivered the best results for all data sets.
CONCLUSIONS
We recommend a careful choice and normalization of evaluation metrics to select the best assembling results as a critical step in the reconstruction of a comprehensive de novo transcriptome assembly.
Identifiants
pubmed: 31077315
pii: 5488105
doi: 10.1093/gigascience/giz039
pmc: PMC6511074
pii:
doi:
Types de publication
Comparative Study
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2019. Published by Oxford University Press.
Références
Genome Res. 2016 Aug;26(8):1134-44
pubmed: 27252236
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Arch Virol Suppl. 1993;7:81-100
pubmed: 8219816
Genome Biol. 2015 Feb 11;16:30
pubmed: 25723335
Genome Res. 2004 Jun;14(6):1147-59
pubmed: 15140833
PLoS One. 2016 Apr 07;11(4):e0153104
pubmed: 27054874
Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12
pubmed: 25348405
Mol Biol Evol. 2018 Mar 1;35(3):543-548
pubmed: 29220515
Genome Biol. 2014 Dec 21;15(12):553
pubmed: 25608678
Nat Methods. 2010 Nov;7(11):909-12
pubmed: 20935650
Gigascience. 2019 Sep 1;8(9):
pubmed: 31494669
Sci Rep. 2016 Oct 07;6:34589
pubmed: 27713552
Front Genet. 2014 Jun 25;5:190
pubmed: 25009556
PLoS One. 2014 Dec 31;9(12):e115055
pubmed: 25551607
BMC Bioinformatics. 2011 Dec 14;12 Suppl 14:S2
pubmed: 22373417
Nat Commun. 2017 Jul 5;8(1):59
pubmed: 28680106
J Bacteriol. 2015 Jan 1;197(1):18-28
pubmed: 25266388
Nat Biotechnol. 2010 May;28(5):511-5
pubmed: 20436464
Gigascience. 2019 May 1;8(5):
pubmed: 31077315
Gigascience. 2012 Dec 27;1(1):18
pubmed: 23587118
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Algorithms Mol Biol. 2017 Feb 22;12:2
pubmed: 28250805
Sci China Life Sci. 2011 Dec;54(12):1129-33
pubmed: 22227905
G3 (Bethesda). 2015 Jan 29;5(4):497-505
pubmed: 25636313
BMC Genomics. 2010 Oct 16;11:571
pubmed: 20950480
Sci China Life Sci. 2013 Feb;56(2):143-55
pubmed: 23393030
Sci China Life Sci. 2013 Feb;56(2):156-62
pubmed: 23393031
Genome Res. 2008 May;18(5):821-9
pubmed: 18349386
Nat Rev Genet. 2009 Jan;10(1):57-63
pubmed: 19015660
Nat Biotechnol. 2010 May;28(5):421-3
pubmed: 20458303
Bioinformatics. 2017 Feb 1;33(3):327-333
pubmed: 28172640
Bioinformatics. 2013 Jul 01;29(13):i326-34
pubmed: 23813001
Bioinformatics. 2012 Apr 15;28(8):1086-92
pubmed: 22368243
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
Nucleic Acids Res. 2012 Jan;40(Database issue):D84-90
pubmed: 22086963
Wiley Interdiscip Rev RNA. 2017 Jan;8(1):
pubmed: 27198714
Nat Rev Genet. 2011 Sep 07;12(10):671-82
pubmed: 21897427
Nat Commun. 2014;5:3064
pubmed: 24451981
Genome Biol. 2016 Jan 26;17:13
pubmed: 26813401
Bioinformatics. 2011 Mar 15;27(6):863-4
pubmed: 21278185
PLoS Comput Biol. 2016 Feb 19;12(2):e1004772
pubmed: 26894997
Bioinformatics. 2019 May 1;35(9):1613-1614
pubmed: 30247621
Nat Methods. 2017 Apr;14(4):417-419
pubmed: 28263959
Genome Res. 2009 Jun;19(6):1117-23
pubmed: 19251739
Nucleic Acids Res. 2012 Nov 1;40(20):10073-83
pubmed: 22962361
Bioinformatics. 2014 Jun 15;30(12):1660-6
pubmed: 24532719
Bioinformatics. 2016 Jul 15;32(14):2210-2
pubmed: 27153654
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712