Impact of sequencing depth and technology on de novo RNA-Seq assembly.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
23 Jul 2019
Historique:
received: 04 03 2019
accepted: 09 07 2019
entrez: 25 7 2019
pubmed: 25 7 2019
medline: 18 12 2019
Statut: epublish

Résumé

RNA-Seq data is inherently nonuniform for different transcripts because of differences in gene expression. This makes it challenging to decide how much data should be generated from each sample. How much should one spend to recover the less expressed transcripts? The sequencing technology used is another consideration, as there are inevitably always biases against certain sequences. To investigate these effects, we first looked at high-depth libraries from a set of well-annotated organisms to ascertain the impact of sequencing depth on de novo assembly. We then looked at libraries sequenced from the Universal Human Reference RNA (UHRR) to compare the performance of Illumina HiSeq and MGI DNBseq™ technologies. On the issue of sequencing depth, the amount of exomic sequence assembled plateaued using data sets of approximately 2 to 8 Gbp. However, the amount of genomic sequence assembled did not plateau for many of the analyzed organisms. Most of the unannotated genomic sequences are single-exon transcripts whose biological significance will be questionable for some users. On the issue of sequencing technology, both of the analyzed platforms recovered a similar number of full-length transcripts. The missing "gap" regions in the HiSeq assemblies were often attributed to higher GC contents, but this may be an artefact of library preparation and not of sequencing technology. Increasing sequencing depth beyond modest data sets of less than 10 Gbp recovers a plethora of single-exon transcripts undocumented in genome annotations. DNBseq™ is a viable alternative to HiSeq for de novo RNA-Seq assembly.

Sections du résumé

BACKGROUND BACKGROUND
RNA-Seq data is inherently nonuniform for different transcripts because of differences in gene expression. This makes it challenging to decide how much data should be generated from each sample. How much should one spend to recover the less expressed transcripts? The sequencing technology used is another consideration, as there are inevitably always biases against certain sequences. To investigate these effects, we first looked at high-depth libraries from a set of well-annotated organisms to ascertain the impact of sequencing depth on de novo assembly. We then looked at libraries sequenced from the Universal Human Reference RNA (UHRR) to compare the performance of Illumina HiSeq and MGI DNBseq™ technologies.
RESULTS RESULTS
On the issue of sequencing depth, the amount of exomic sequence assembled plateaued using data sets of approximately 2 to 8 Gbp. However, the amount of genomic sequence assembled did not plateau for many of the analyzed organisms. Most of the unannotated genomic sequences are single-exon transcripts whose biological significance will be questionable for some users. On the issue of sequencing technology, both of the analyzed platforms recovered a similar number of full-length transcripts. The missing "gap" regions in the HiSeq assemblies were often attributed to higher GC contents, but this may be an artefact of library preparation and not of sequencing technology.
CONCLUSIONS CONCLUSIONS
Increasing sequencing depth beyond modest data sets of less than 10 Gbp recovers a plethora of single-exon transcripts undocumented in genome annotations. DNBseq™ is a viable alternative to HiSeq for de novo RNA-Seq assembly.

Identifiants

pubmed: 31337347
doi: 10.1186/s12864-019-5965-x
pii: 10.1186/s12864-019-5965-x
pmc: PMC6651908
doi:

Types de publication

Comparative Study Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

604

Subventions

Organisme : Alberta Innovates - Technology Futures
ID : RES0010334

Références

Genome Biol Evol. 2013;5(3):578-90
pubmed: 23431001
Clin Epigenetics. 2016 Nov 21;8:123
pubmed: 27895807
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Nat Rev Genet. 2009 Jan;10(1):57-63
pubmed: 19015660
Gigascience. 2018 Jan 1;7(1):1-6
pubmed: 29220494
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Genome Biol. 2011 Nov 08;12(11):R112
pubmed: 22067484
Genome Biol. 2013 May 29;14(5):R51
pubmed: 23718773
BMC Genomics. 2018 May 8;19(1):332
pubmed: 29739332
Genome Res. 2002 Aug;12(8):1185-9
pubmed: 12176926
BMC Genomics. 2019 Mar 13;20(1):215
pubmed: 30866797
Nature. 2002 Dec 5;420(6915):563-73
pubmed: 12466851
Nat Biotechnol. 2016 May;34(5):525-7
pubmed: 27043002
BMC Genomics. 2013 Oct 01;14:670
pubmed: 24083400
Nat Protoc. 2012 Mar 01;7(3):562-78
pubmed: 22383036
Nature. 2004 Oct 14;431(7010):1 p following 757; discussion following 757
pubmed: 15495343
Plant Methods. 2018 Aug 13;14:69
pubmed: 30123314
Science. 2014 Nov 7;346(6210):763-7
pubmed: 25378627
Genome Res. 2011 Mar;21(3):487-93
pubmed: 21209072
Proc Natl Acad Sci U S A. 2013 Apr 2;110(14):5294-300
pubmed: 23479647
Proc Natl Acad Sci U S A. 2014 Apr 29;111(17):6131-8
pubmed: 24753594
Genome Res. 2002 Apr;12(4):656-64
pubmed: 11932250
Gigascience. 2017 Aug 1;6(8):1-13
pubmed: 28854615
PLoS One. 2018 Jan 10;13(1):e0190264
pubmed: 29320538
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
Genome Biol. 2013 Apr 25;14(4):R36
pubmed: 23618408
Nat Rev Genet. 2011 Sep 07;12(10):671-82
pubmed: 21897427
Proc Natl Acad Sci U S A. 2014 Aug 19;111(33):E3366
pubmed: 25275169
Gigascience. 2018 Mar 1;7(3):1-8
pubmed: 29293960
Proc Natl Acad Sci U S A. 2014 Aug 19;111(33):E3365
pubmed: 25107292
Proc Natl Acad Sci U S A. 2014 Nov 11;111(45):E4859-68
pubmed: 25355905
Bioinformatics. 2014 Jun 15;30(12):1660-6
pubmed: 24532719
BMC Genomics. 2019 Jul 23;20(1):604
pubmed: 31337347
Gigascience. 2017 May 1;6(5):1-9
pubmed: 28379488

Auteurs

Jordan Patterson (J)

Department of Medicine, University of Alberta, Edmonton, AB, T6G 2E1, Canada.

Eric J Carpenter (EJ)

Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada.

Zhenzhen Zhu (Z)

MGI, BGI-Shenzhen, Shenzhen, 518083, China.

Dan An (D)

MGI, BGI-Shenzhen, Shenzhen, 518083, China.

Xinming Liang (X)

MGI, BGI-Shenzhen, Shenzhen, 518083, China.

Chunyu Geng (C)

MGI, BGI-Shenzhen, Shenzhen, 518083, China.

Radoje Drmanac (R)

MGI, BGI-Shenzhen, Shenzhen, 518083, China.

Gane Ka-Shu Wong (GK)

Department of Medicine, University of Alberta, Edmonton, AB, T6G 2E1, Canada. gane@ualberta.ca.
Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada. gane@ualberta.ca.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH