The DataHarmonizer: a tool for faster data harmonization, validation, aggregation and analysis of pathogen genomics contextual information.
contextual data
data management
genomic surveillance
harmonization
metadata
Journal
Microbial genomics
ISSN: 2057-5858
Titre abrégé: Microb Genom
Pays: England
ID NLM: 101671820
Informations de publication
Date de publication:
01 2023
01 2023
Historique:
entrez:
7
2
2023
pubmed:
8
2
2023
medline:
9
2
2023
Statut:
ppublish
Résumé
Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.
Identifiants
pubmed: 36748616
doi: 10.1099/mgen.0.000908
pmc: PMC9973856
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : CIHR
ID : PJT-159456
Pays : Canada
Références
J Biomed Inform. 2009 Apr;42(2):377-81
pubmed: 18929686
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Science. 2021 Feb 12;371(6530):708-712
pubmed: 33419936
Nat Commun. 2021 Aug 10;12(1):4809
pubmed: 34376689
BMC Public Health. 2014 Nov 05;14:1144
pubmed: 25377061
Gigascience. 2022 Feb 16;11:
pubmed: 35169842
Nat Commun. 2020 Sep 1;11(1):4376
pubmed: 32873808
Nat Biotechnol. 2011 May;29(5):415-20
pubmed: 21552244
JAMA Netw Open. 2020 Oct 1;3(10):e2024191
pubmed: 33026453
Sci Data. 2020 Jun 19;7(1):188
pubmed: 32561801
BMC Res Notes. 2021 May 17;14(1):189
pubmed: 34001211
Nat Med. 2020 Nov;26(11):1802
pubmed: 33082576
Cell. 2020 May 28;181(5):990-996.e5
pubmed: 32386545
J Biomed Inform. 2019 Jul;95:103208
pubmed: 31078660
Genome Biol. 2021 Apr 15;22(1):106
pubmed: 33858487