Impact of sequencing depth and technology on de novo RNA-Seq assembly.

Animals Arabidopsis Exons Gene Library Humans Molecular Sequence Annotation Open Reading Frames Oryza RNA-Seq / methods

Rna-seq assembly Sequencing depth Sequencing technology

Journal

BMC genomics

ISSN: 1471-2164

Titre abrégé: BMC Genomics

Pays: England

ID NLM: 100965258

Informations de publication

Date de publication:
23 Jul 2019

Historique:

received: 04 03 2019

accepted: 09 07 2019

entrez: 25 7 2019

pubmed: 25 7 2019

medline: 18 12 2019

Statut: epublish

Résumé

RNA-Seq data is inherently nonuniform for different transcripts because of differences in gene expression. This makes it challenging to decide how much data should be generated from each sample. How much should one spend to recover the less expressed transcripts? The sequencing technology used is another consideration, as there are inevitably always biases against certain sequences. To investigate these effects, we first looked at high-depth libraries from a set of well-annotated organisms to ascertain the impact of sequencing depth on de novo assembly. We then looked at libraries sequenced from the Universal Human Reference RNA (UHRR) to compare the performance of Illumina HiSeq and MGI DNBseq™ technologies. On the issue of sequencing depth, the amount of exomic sequence assembled plateaued using data sets of approximately 2 to 8 Gbp. However, the amount of genomic sequence assembled did not plateau for many of the analyzed organisms. Most of the unannotated genomic sequences are single-exon transcripts whose biological significance will be questionable for some users. On the issue of sequencing technology, both of the analyzed platforms recovered a similar number of full-length transcripts. The missing "gap" regions in the HiSeq assemblies were often attributed to higher GC contents, but this may be an artefact of library preparation and not of sequencing technology. Increasing sequencing depth beyond modest data sets of less than 10 Gbp recovers a plethora of single-exon transcripts undocumented in genome annotations. DNBseq™ is a viable alternative to HiSeq for de novo RNA-Seq assembly.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

On the issue of sequencing depth, the amount of exomic sequence assembled plateaued using data sets of approximately 2 to 8 Gbp. However, the amount of genomic sequence assembled did not plateau for many of the analyzed organisms. Most of the unannotated genomic sequences are single-exon transcripts whose biological significance will be questionable for some users. On the issue of sequencing technology, both of the analyzed platforms recovered a similar number of full-length transcripts. The missing "gap" regions in the HiSeq assemblies were often attributed to higher GC contents, but this may be an artefact of library preparation and not of sequencing technology.

CONCLUSIONS CONCLUSIONS

Increasing sequencing depth beyond modest data sets of less than 10 Gbp recovers a plethora of single-exon transcripts undocumented in genome annotations. DNBseq™ is a viable alternative to HiSeq for de novo RNA-Seq assembly.

Identifiants

DOI: 10.1186/s12864-019-5965-x PMID: 31337347 PMC: PMC6651908

pubmed: 31337347

doi: 10.1186/s12864-019-5965-x

pii: 10.1186/s12864-019-5965-x

pmc: PMC6651908

doi:

Types de publication

Comparative Study Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

604

Subventions

Organisme : Alberta Innovates - Technology Futures

ID : RES0010334

Références

Genome Biol Evol. 2013;5(3):578-90

pubmed: 23431001

Clin Epigenetics. 2016 Nov 21;8:123

pubmed: 27895807

Nature. 2012 Sep 6;489(7414):57-74

pubmed: 22955616

Nat Rev Genet. 2009 Jan;10(1):57-63

pubmed: 19015660

Gigascience. 2018 Jan 1;7(1):1-6

pubmed: 29220494

Genome Res. 2012 Sep;22(9):1760-74

pubmed: 22955987

Genome Biol. 2011 Nov 08;12(11):R112

pubmed: 22067484

Genome Biol. 2013 May 29;14(5):R51

pubmed: 23718773

BMC Genomics. 2018 May 8;19(1):332

pubmed: 29739332

Genome Res. 2002 Aug;12(8):1185-9

pubmed: 12176926

BMC Genomics. 2019 Mar 13;20(1):215

pubmed: 30866797

Nature. 2002 Dec 5;420(6915):563-73

pubmed: 12466851

Nat Biotechnol. 2016 May;34(5):525-7

pubmed: 27043002

BMC Genomics. 2013 Oct 01;14:670

pubmed: 24083400

Nat Protoc. 2012 Mar 01;7(3):562-78

pubmed: 22383036

Nature. 2004 Oct 14;431(7010):1 p following 757; discussion following 757

pubmed: 15495343

Plant Methods. 2018 Aug 13;14:69

pubmed: 30123314

Science. 2014 Nov 7;346(6210):763-7

pubmed: 25378627

Genome Res. 2011 Mar;21(3):487-93

pubmed: 21209072

Proc Natl Acad Sci U S A. 2013 Apr 2;110(14):5294-300

pubmed: 23479647

Proc Natl Acad Sci U S A. 2014 Apr 29;111(17):6131-8

pubmed: 24753594

Genome Res. 2002 Apr;12(4):656-64

pubmed: 11932250

Gigascience. 2017 Aug 1;6(8):1-13

pubmed: 28854615

PLoS One. 2018 Jan 10;13(1):e0190264

pubmed: 29320538

Nat Methods. 2015 Apr;12(4):357-60

pubmed: 25751142

Genome Biol. 2013 Apr 25;14(4):R36

pubmed: 23618408

Nat Rev Genet. 2011 Sep 07;12(10):671-82

pubmed: 21897427

Proc Natl Acad Sci U S A. 2014 Aug 19;111(33):E3366

pubmed: 25275169

Gigascience. 2018 Mar 1;7(3):1-8

pubmed: 29293960

Proc Natl Acad Sci U S A. 2014 Aug 19;111(33):E3365

pubmed: 25107292

Proc Natl Acad Sci U S A. 2014 Nov 11;111(45):E4859-68

pubmed: 25355905

Bioinformatics. 2014 Jun 15;30(12):1660-6

pubmed: 24532719

BMC Genomics. 2019 Jul 23;20(1):604

pubmed: 31337347

Gigascience. 2017 May 1;6(5):1-9

pubmed: 28379488

Impact of sequencing depth and technology on de novo RNA-Seq assembly.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Références

Auteurs

Jordan Patterson (J)

Eric J Carpenter (EJ)

Zhenzhen Zhu (Z)

Dan An (D)

Xinming Liang (X)

Chunyu Geng (C)

Radoje Drmanac (R)

Gane Ka-Shu Wong (GK)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH