GalaxyTrakr: a distributed analysis tool for public health whole genome sequence data accessible to non-bioinformaticians.
Biosurveillance
Food safety
Galaxy
GenomeTrakr
Genomic surveillance
Public health
Whole genome sequencing
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
10 Feb 2021
10 Feb 2021
Historique:
received:
03
11
2020
accepted:
22
01
2021
entrez:
11
2
2021
pubmed:
12
2
2021
medline:
15
5
2021
Statut:
epublish
Résumé
Processing and analyzing whole genome sequencing (WGS) is computationally intense: a single Illumina MiSeq WGS run produces ~ 1 million 250-base-pair reads for each of 24 samples. This poses significant obstacles for smaller laboratories, or laboratories not affiliated with larger projects, which may not have dedicated bioinformatics staff or computing power to effectively use genomic data to protect public health. Building on the success of the cloud-based Galaxy bioinformatics platform ( http://galaxyproject.org ), already known for its user-friendliness and powerful WGS analytical tools, the Center for Food Safety and Applied Nutrition (CFSAN) at the U.S. Food and Drug Administration (FDA) created a customized 'instance' of the Galaxy environment, called GalaxyTrakr ( https://www.galaxytrakr.org ), for use by laboratory scientists performing food-safety regulatory research. The goal was to enable laboratories outside of the FDA internal network to (1) perform quality assessments of sequence data, (2) identify links between clinical isolates and positive food/environmental samples, including those at the National Center for Biotechnology Information sequence read archive ( https://www.ncbi.nlm.nih.gov/sra/ ), and (3) explore new methodologies such as metagenomics. GalaxyTrakr hosts a variety of free and adaptable tools and provides the data storage and computing power to run the tools. These tools support coordinated analytic methods and consistent interpretation of results across laboratories. Users can create and share tools for their specific needs and use sequence data generated locally and elsewhere. In its first full year (2018), GalaxyTrakr processed over 85,000 jobs and went from 25 to 250 users, representing 53 different public and state health laboratories, academic institutions, international health laboratories, and federal organizations. By mid-2020, it has grown to 600 registered users and processed over 450,000 analytical jobs. To illustrate how laboratories are making use of this resource, we describe how six institutions use GalaxyTrakr to quickly analyze and review their data. Instructions for participating in GalaxyTrakr are provided. GalaxyTrakr advances food safety by providing reliable and harmonized WGS analyses for public health laboratories and promoting collaboration across laboratories with differing resources. Anticipated enhancements to this resource will include workflows for additional foodborne pathogens, viruses, and parasites, as well as new tools and services.
Sections du résumé
BACKGROUND
BACKGROUND
Processing and analyzing whole genome sequencing (WGS) is computationally intense: a single Illumina MiSeq WGS run produces ~ 1 million 250-base-pair reads for each of 24 samples. This poses significant obstacles for smaller laboratories, or laboratories not affiliated with larger projects, which may not have dedicated bioinformatics staff or computing power to effectively use genomic data to protect public health. Building on the success of the cloud-based Galaxy bioinformatics platform ( http://galaxyproject.org ), already known for its user-friendliness and powerful WGS analytical tools, the Center for Food Safety and Applied Nutrition (CFSAN) at the U.S. Food and Drug Administration (FDA) created a customized 'instance' of the Galaxy environment, called GalaxyTrakr ( https://www.galaxytrakr.org ), for use by laboratory scientists performing food-safety regulatory research. The goal was to enable laboratories outside of the FDA internal network to (1) perform quality assessments of sequence data, (2) identify links between clinical isolates and positive food/environmental samples, including those at the National Center for Biotechnology Information sequence read archive ( https://www.ncbi.nlm.nih.gov/sra/ ), and (3) explore new methodologies such as metagenomics. GalaxyTrakr hosts a variety of free and adaptable tools and provides the data storage and computing power to run the tools. These tools support coordinated analytic methods and consistent interpretation of results across laboratories. Users can create and share tools for their specific needs and use sequence data generated locally and elsewhere.
RESULTS
RESULTS
In its first full year (2018), GalaxyTrakr processed over 85,000 jobs and went from 25 to 250 users, representing 53 different public and state health laboratories, academic institutions, international health laboratories, and federal organizations. By mid-2020, it has grown to 600 registered users and processed over 450,000 analytical jobs. To illustrate how laboratories are making use of this resource, we describe how six institutions use GalaxyTrakr to quickly analyze and review their data. Instructions for participating in GalaxyTrakr are provided.
CONCLUSIONS
CONCLUSIONS
GalaxyTrakr advances food safety by providing reliable and harmonized WGS analyses for public health laboratories and promoting collaboration across laboratories with differing resources. Anticipated enhancements to this resource will include workflows for additional foodborne pathogens, viruses, and parasites, as well as new tools and services.
Identifiants
pubmed: 33568057
doi: 10.1186/s12864-021-07405-8
pii: 10.1186/s12864-021-07405-8
pmc: PMC7877046
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
114Références
Nucleic Acids Res. 2016 Jan 4;44(D1):D48-50
pubmed: 26657633
Genome Biol. 2018 Oct 4;19(1):153
pubmed: 30286803
BMC Bioinformatics. 2010 Dec 21;11 Suppl 12:S4
pubmed: 21210983
Nucleic Acids Res. 2018 Jan 4;46(D1):D48-D51
pubmed: 29190397
Brief Bioinform. 2015 Jul;16(4):700-9
pubmed: 25183247
Nat Biotechnol. 2011 Nov 08;29(11):972-4
pubmed: 22068528
Nat Med. 2020 Jun;26(6):832-841
pubmed: 32528156
Bioinformatics. 2009 Nov 15;25(22):3005-11
pubmed: 19689959
Curr Protoc Bioinformatics. 2012 Jun;Chapter 10:Unit10.5
pubmed: 22700312
J Clin Microbiol. 2021 Jan 21;59(2):
pubmed: 33177125
Bioinformatics. 2014 Jul 15;30(14):2068-9
pubmed: 24642063
Curr Protoc Mol Biol. 2010 Jan;Chapter 19:Unit 19.10.1-21
pubmed: 20069535
Nucleic Acids Res. 2018 Jul 2;46(W1):W537-W544
pubmed: 29790989
Nat Genet. 2006 May;38(5):500-1
pubmed: 16642009
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
J Clin Microbiol. 2015 May;53(5):1685-92
pubmed: 25762776
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
MMWR Morb Mortal Wkly Rep. 2018 Sep 21;67(37):1032-1035
pubmed: 30235182
Genome Biol. 2010;11(8):R86
pubmed: 20738864
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Emerg Infect Dis. 2020 Apr;26(4):789-792
pubmed: 32186505
Bioinformatics. 2010 Jul 1;26(13):1669-70
pubmed: 20472542
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
Wellcome Open Res. 2018 Sep 24;3:124
pubmed: 30345391
J Clin Microbiol. 2016 Aug;54(8):1975-83
pubmed: 27008877