iRODS metadata management for a cancer genome analysis workflow.
Data consistency
Data security
Genome analysis
High performance computing (HPC)
Metadata management
Next generation sequencing (NGS)
Workflow integration
iRODS
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
15 Jan 2019
15 Jan 2019
Historique:
received:
31
07
2018
accepted:
10
12
2018
entrez:
17
1
2019
pubmed:
17
1
2019
medline:
13
2
2019
Statut:
epublish
Résumé
The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date. We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information. Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments.
Sections du résumé
BACKGROUND
BACKGROUND
The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date.
RESULTS
RESULTS
We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information.
CONCLUSIONS
CONCLUSIONS
Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments.
Identifiants
pubmed: 30646845
doi: 10.1186/s12859-018-2576-5
pii: 10.1186/s12859-018-2576-5
pmc: PMC6334444
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
29Références
Nat Commun. 2014 Mar 27;5:3518
pubmed: 24670920
PLoS One. 2015 May 05;10(5):e0126321
pubmed: 25942438
Nat Protoc. 2018 Jun;13(6):1488-1501
pubmed: 29844525
Nat Biotechnol. 2012 Mar 07;30(3):226-9
pubmed: 22398614
Bioinformatics. 2014 May 15;30(10):1471-2
pubmed: 24470576
Nature. 2015 Oct 29;526(7575):700-4
pubmed: 26466568
BMC Bioinformatics. 2017 Jul 25;18(1):354
pubmed: 28743252
Bioinformatics. 2014 Oct 15;30(20):2843-51
pubmed: 24974202
BMC Bioinformatics. 2011 Sep 09;12:361
pubmed: 21906284
Plant Direct. 2017 Jul 20;1(2):
pubmed: 31240274
Nat Commun. 2018 Feb 20;9(1):727
pubmed: 29463802
BMC Bioinformatics. 2013;14 Suppl 7:S11
pubmed: 23815231
PLoS One. 2012;7(8):e41948
pubmed: 22870267
Nat Genet. 2012 Oct;44(10):1104-10
pubmed: 22941188
Cold Spring Harb Perspect Med. 2018 Apr 2;8(4):
pubmed: 28710259
Nature. 2015 Aug 6;524(7563):47-53
pubmed: 26168399
Nat Commun. 2018 Mar 13;9(1):1048
pubmed: 29535388
Cancer Discov. 2018 May;8(5):600-615
pubmed: 29483136