GenPipes: an open-source framework for distributed and scalable genomic analyses.
bioinformatics
frameworks
genomics
pipeline
workflow
workflow management systems
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
01 06 2019
01 06 2019
Historique:
received:
02
06
2018
revised:
28
09
2018
accepted:
10
03
2019
entrez:
12
6
2019
pubmed:
12
6
2019
medline:
6
2
2020
Statut:
ppublish
Résumé
With the decreasing cost of sequencing and the rapid developments in genomics technologies and protocols, the need for validated bioinformatics software that enables efficient large-scale data processing is growing. Here we present GenPipes, a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud. GenPipes already implements 12 validated and scalable pipelines for various genomics applications, including RNA sequencing, chromatin immunoprecipitation sequencing, DNA sequencing, methylation sequencing, Hi-C, capture Hi-C, metagenomics, and Pacific Biosciences long-read assembly. The software is available under a GPLv3 open source license and is continuously updated to follow recent advances in genomics and bioinformatics. The framework has already been configured on several servers, and a Docker image is also available to facilitate additional installations. GenPipes offers genomics researchers a simple method to analyze different types of data, customizable to their needs and resources, as well as the flexibility to create their own workflows.
Sections du résumé
BACKGROUND
With the decreasing cost of sequencing and the rapid developments in genomics technologies and protocols, the need for validated bioinformatics software that enables efficient large-scale data processing is growing.
FINDINGS
Here we present GenPipes, a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud. GenPipes already implements 12 validated and scalable pipelines for various genomics applications, including RNA sequencing, chromatin immunoprecipitation sequencing, DNA sequencing, methylation sequencing, Hi-C, capture Hi-C, metagenomics, and Pacific Biosciences long-read assembly. The software is available under a GPLv3 open source license and is continuously updated to follow recent advances in genomics and bioinformatics. The framework has already been configured on several servers, and a Docker image is also available to facilitate additional installations.
CONCLUSIONS
GenPipes offers genomics researchers a simple method to analyze different types of data, customizable to their needs and resources, as well as the flexibility to create their own workflows.
Identifiants
pubmed: 31185495
pii: 5513895
doi: 10.1093/gigascience/giz037
pmc: PMC6559338
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : CIHR
ID : MOP-115090
Pays : Canada
Informations de copyright
© The Author(s) 2019. Published by Oxford University Press.
Références
Genome Biol. 2016 Jun 15;17(1):127
pubmed: 27306882
Nat Genet. 2006 May;38(5):500-1
pubmed: 16642009
Genome Biol. 2010;11(10):R106
pubmed: 20979621
Mol Cell. 2010 May 28;38(4):576-89
pubmed: 20513432
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Science. 2004 Oct 22;306(5696):636-40
pubmed: 15499007
Mol Biol Evol. 2009 Jul;26(7):1641-50
pubmed: 19377059
Curr Protoc Bioinformatics. 2011 Dec;Chapter 10:10.7.1-10.7.20
pubmed: 22161565
Bioinformatics. 2015 Jan 15;31(2):166-9
pubmed: 25260700
PLoS Genet. 2018 Apr 12;14(4):e1007285
pubmed: 29649218
Bioinformatics. 2012 Sep 15;28(18):i333-i339
pubmed: 22962449
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Bioinformatics. 2015 Jun 1;31(11):1724-8
pubmed: 25637560
PLoS Comput Biol. 2015 Dec 01;11(12):e1004572
pubmed: 26625158
Bioinformatics. 2012 Jun 1;28(11):1525-6
pubmed: 22500002
Bioinformatics. 2012 Jun 1;28(11):1530-2
pubmed: 22539670
Cell. 2016 Nov 17;167(5):1145-1149
pubmed: 27863232
BMC Bioinformatics. 2010 Dec 21;11 Suppl 12:S2
pubmed: 21210981
Nat Biotechnol. 2013 Jan;31(1):46-53
pubmed: 23222703
Nat Biotechnol. 2010 May;28(5):511-5
pubmed: 20436464
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Cell Syst. 2016 Jul;3(1):95-8
pubmed: 27467249
Nucleic Acids Res. 2016 Jul 8;44(W1):W3-W10
pubmed: 27137889
Nat Biotechnol. 2016 May;34(5):525-7
pubmed: 27043002
Nat Commun. 2016 Jul 12;7:12156
pubmed: 27402251
Am J Hum Genet. 2017 Nov 2;101(5):664-685
pubmed: 29100083
Andrology. 2016 Jan;4(1):95-110
pubmed: 26588606
Gigascience. 2019 Jun 1;8(6):
pubmed: 31185495
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Sci Rep. 2017 Mar 14;7:44260
pubmed: 28290481
Genome Biol. 2010;11(2):R14
pubmed: 20132535
Genome Biol. 2014 Jun 26;15(6):R84
pubmed: 24970577
Bioinformatics. 2015 Jan 1;31(1):10-6
pubmed: 25189778
PLoS Comput Biol. 2016 Apr 21;12(4):e1004873
pubmed: 27100738
Nucleic Acids Res. 2016 Apr 20;44(7):e70
pubmed: 26704975
Nat Genet. 2014 May;46(5):451-6
pubmed: 24705254
Bioinformatics. 2011 Nov 1;27(21):2957-63
pubmed: 21903629
Mol Psychiatry. 2017 Aug;22(8):1119-1125
pubmed: 27956742
Bioinformatics. 2016 Oct 1;32(19):3047-8
pubmed: 27312411
Bioinformatics. 2010 Jan 1;26(1):139-40
pubmed: 19910308
Genome Biol. 2015 Dec 29;16:294
pubmed: 26714481
Genome Biol. 2008;9(9):R137
pubmed: 18798982
PLoS Comput Biol. 2015 Jul 09;11(7):e1004274
pubmed: 26158448
PeerJ. 2016 Oct 18;4:e2584
pubmed: 27781170
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Genome Med. 2010 Nov 26;2(11):84
pubmed: 21114804
Bioinformatics. 2015 Aug 15;31(16):2741-4
pubmed: 25861968
Bioinformatics. 2010 Jan 15;26(2):266-7
pubmed: 19914921
Nat Methods. 2015 Jul;12(7):623-30
pubmed: 25984700
Gigascience. 2017 Jul 1;6(7):1-10
pubmed: 28655203
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Bioinformatics. 2014 May 15;30(10):1471-2
pubmed: 24470576
Hum Mutat. 2016 Mar;37(3):235-41
pubmed: 26555599
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
Bioinformatics. 2011 Jun 1;27(11):1571-2
pubmed: 21493656
Genome Biol. 2012 Jul 11;13(7):R61
pubmed: 22784381
Genome Res. 2018 Apr;28(4):581-591
pubmed: 29535149
F1000Res. 2015 Nov 20;4:1310
pubmed: 26835000
Genome Res. 2012 Mar;22(3):568-76
pubmed: 22300766
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nat Methods. 2013 Jun;10(6):563-9
pubmed: 23644548
Nat Commun. 2014 Oct 29;5:5135
pubmed: 25351205
Nucleic Acids Res. 2016 Jun 20;44(11):e108
pubmed: 27060149
PLoS Comput Biol. 2013;9(7):e1003153
pubmed: 23874191
Cell Rep. 2016 Nov 15;17(8):2137-2150
pubmed: 27851974
Bioinformatics. 2011 Aug 15;27(16):2194-200
pubmed: 21700674
Cancer Cell. 2016 Dec 12;30(6):891-908
pubmed: 27960086
Neurobiol Aging. 2017 Nov;59:220.e1-220.e9
pubmed: 28789839