Moving Just Enough Deep Sequencing Data to Get the Job Done.

FASTQ RNA-Seq data transfers high-throughput DNA sequencing

Journal

Bioinformatics and biology insights
ISSN: 1177-9322
Titre abrégé: Bioinform Biol Insights
Pays: United States
ID NLM: 101467187

Informations de publication

Date de publication:
2019
Historique:
received: 23 04 2019
accepted: 21 05 2019
entrez: 26 6 2019
pubmed: 27 6 2019
medline: 27 6 2019
Statut: epublish

Résumé

As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. All results were generated using public datasets from NCBI and publicly available open source software.

Identifiants

pubmed: 31236009
doi: 10.1177/1177932219856359
pii: 10.1177_1177932219856359
pmc: PMC6572328
doi:

Types de publication

Journal Article

Langues

eng

Pagination

1177932219856359

Déclaration de conflit d'intérêts

Declaration of Conflicting Interests:The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Références

Proc Natl Acad Sci U S A. 2000 Oct 24;97(22):12182-6
pubmed: 11027309
Nature. 2004 Oct 21;431(7011):931-45
pubmed: 15496913
Nat Methods. 2008 Jul;5(7):621-8
pubmed: 18516045
BMC Bioinformatics. 2008 Dec 29;9:559
pubmed: 19114008
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nucleic Acids Res. 2010 Apr;38(6):1767-71
pubmed: 20015970
Genome Biol. 2010;11(10):R106
pubmed: 20979621
Nat Protoc. 2012 Mar 01;7(3):562-78
pubmed: 22383036
BMC Genomics. 2012 Sep 17;13:484
pubmed: 22985019
Mol Psychiatry. 2014 Nov;19(11):1179-85
pubmed: 24393808
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Nat Biotechnol. 2015 Mar;33(3):290-5
pubmed: 25690850
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
PLoS One. 2015 Jul 01;10(7):e0128864
pubmed: 26132737
Bioinform Biol Insights. 2016 Aug 02;10:133-41
pubmed: 27499617
J Biol Chem. 2016 Sep 23;291(39):20661-73
pubmed: 27502280
Nat Protoc. 2016 Sep;11(9):1650-67
pubmed: 27560171
Sci Transl Med. 2017 Feb 22;9(378):
pubmed: 28228601
Sci Rep. 2017 Aug 17;7(1):8617
pubmed: 28819158
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950

Auteurs

Nicholas Mills (N)

Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA.

Ethan M Bensman (EM)

School of Computing, Clemson University, Clemson, SC, USA.

William L Poehlman (WL)

Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.

Walter B Ligon (WB)

Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA.

F Alex Feltus (FA)

Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.

Classifications MeSH