HaTSPiL: A modular pipeline for high-throughput sequencing data analysis.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2019
2019
Historique:
received:
25
03
2019
accepted:
30
08
2019
entrez:
16
10
2019
pubmed:
16
10
2019
medline:
11
3
2020
Statut:
epublish
Résumé
Next generation sequencing methods are widely adopted for a large amount of scientific purposes, from pure research to health-related studies. The decreasing costs per analysis led to big amounts of generated data and to the subsequent improvement of software for the respective analyses. As a consequence, many approaches have been developed to chain different software in order to obtain reliable and reproducible workflows. However, the large range of applications for NGS approaches entails the challenge to manage many different workflows without losing reliability. We here present a high-throughput sequencing pipeline (HaTSPiL), a Python-powered CLI tool designed to handle different approaches for data analysis with a high level of reliability. The software relies on the barcoding of filenames using a human readable naming convention that contains any information regarding the sample needed by the software to automatically choose different workflows and parameters. HaTSPiL is highly modular and customisable, allowing the users to extend its features for any specific need. HaTSPiL is licensed as Free Software under the MIT license and it is available at https://github.com/dodomorandi/hatspil.
Sections du résumé
BACKGROUND
Next generation sequencing methods are widely adopted for a large amount of scientific purposes, from pure research to health-related studies. The decreasing costs per analysis led to big amounts of generated data and to the subsequent improvement of software for the respective analyses. As a consequence, many approaches have been developed to chain different software in order to obtain reliable and reproducible workflows. However, the large range of applications for NGS approaches entails the challenge to manage many different workflows without losing reliability.
METHODS
We here present a high-throughput sequencing pipeline (HaTSPiL), a Python-powered CLI tool designed to handle different approaches for data analysis with a high level of reliability. The software relies on the barcoding of filenames using a human readable naming convention that contains any information regarding the sample needed by the software to automatically choose different workflows and parameters. HaTSPiL is highly modular and customisable, allowing the users to extend its features for any specific need.
CONCLUSIONS
HaTSPiL is licensed as Free Software under the MIT license and it is available at https://github.com/dodomorandi/hatspil.
Identifiants
pubmed: 31613890
doi: 10.1371/journal.pone.0222512
pii: PONE-D-19-08543
pmc: PMC6793853
doi:
Substances chimiques
DNA
9007-49-2
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0222512Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Bioinformatics. 2012 Jun 1;28(11):1525-6
pubmed: 22500002
Bioinformatics. 2018 Oct 15;34(20):3600
pubmed: 29788404
BMC Res Notes. 2011 Sep 08;4:331
pubmed: 21899774
Nat Biotechnol. 2013 Mar;31(3):213-9
pubmed: 23396013
Nucleic Acids Res. 2001 Jan 1;29(1):308-11
pubmed: 11125122
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
F1000Res. 2016 Jun 29;5:1542
pubmed: 28232861
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W729-32
pubmed: 16845108
Nat Methods. 2011 Dec 28;9(1):7-8
pubmed: 22205509
Bioinformatics. 2015 Jan 1;31(1):10-6
pubmed: 25189778
Genome Biol. 2010;11(8):R86
pubmed: 20738864
Brief Bioinform. 2017 May 1;18(3):530-536
pubmed: 27013646
Bioinformatics. 2018 Jun 1;34(11):1934-1936
pubmed: 29361152
PLoS One. 2016 Oct 5;11(10):e0163962
pubmed: 27706213
Bioinformatics. 2010 Nov 1;26(21):2778-9
pubmed: 20847218
Nucleic Acids Res. 2017 Jan 4;45(D1):D777-D783
pubmed: 27899578
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Genome Res. 2012 Mar;22(3):568-76
pubmed: 22300766
Bioinformatics. 2012 Jul 15;28(14):1811-7
pubmed: 22581179
Bioinformatics. 2012 Jun 15;28(12):i172-8
pubmed: 22689758
F1000Res. 2016 Nov 22;5:2741
pubmed: 27990269
Nucleic Acids Res. 2018 Jan 4;46(D1):D1062-D1067
pubmed: 29165669