Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.

DDI Data Documentation Initiative FAIR data principles JSON-LD JavaScript Object Notation for Linked Data OMOP CDM Observational Medical Outcomes Partnership Common Data Model data models data science machine-readable metadata metadata standardization

Journal

Online journal of public health informatics
ISSN: 1947-2579
Titre abrégé: Online J Public Health Inform
Pays: Canada
ID NLM: 101536954

Informations de publication

Date de publication:
01 Aug 2024
Historique:
received: 11 01 2024
accepted: 19 05 2024
revised: 07 05 2024
medline: 1 8 2024
pubmed: 1 8 2024
entrez: 1 8 2024
Statut: epublish

Résumé

Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse. To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications. The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya. The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website. The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide.

Sections du résumé

BACKGROUND BACKGROUND
Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse.
OBJECTIVE OBJECTIVE
To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications.
METHODS METHODS
The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya.
RESULTS RESULTS
The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website.
CONCLUSIONS CONCLUSIONS
The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide.

Identifiants

pubmed: 39088253
pii: v16i1e56237
doi: 10.2196/56237
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e56237

Informations de copyright

©David Amadi, Sylvia Kiwuwa-Muyingo, Tathagata Bhattacharjee, Amelia Taylor, Agnes Kiragga, Michael Ochola, Chifundo Kanjala, Arofan Gregory, Keith Tomlin, Jim Todd, Jay Greenfield, INSPIRE INSPIRE Network. Originally published in the Online Journal of Public Health Informatics (https://ojphi.jmir.org/), 01.08.2024.

Auteurs

David Amadi (D)

Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom.

Sylvia Kiwuwa-Muyingo (S)

African Population and Health Research Center, Nairobi, Kenya.

Tathagata Bhattacharjee (T)

Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom.

Amelia Taylor (A)

Malawi University of Business and Applied Science, Blantyre, Malawi.

Agnes Kiragga (A)

African Population and Health Research Center, Nairobi, Kenya.

Michael Ochola (M)

African Population and Health Research Center, Nairobi, Kenya.

Chifundo Kanjala (C)

Independent researcher, Lilongwe, Malawi.

Arofan Gregory (A)

Committee on Data (CODATA), Paris, France.

Keith Tomlin (K)

Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom.

Jim Todd (J)

Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom.

Jay Greenfield (J)

Committee on Data (CODATA), Paris, France.

Classifications MeSH