Linked Entity Attribute Pair (LEAP): A Harmonization Framework for Data Pooling.
Journal
JCO clinical cancer informatics
ISSN: 2473-4276
Titre abrégé: JCO Clin Cancer Inform
Pays: United States
ID NLM: 101708809
Informations de publication
Date de publication:
08 2020
08 2020
Historique:
entrez:
7
8
2020
pubmed:
7
8
2020
medline:
1
9
2021
Statut:
ppublish
Résumé
As data-sharing projects become increasingly frequent, so does the need to map data elements between multiple classification systems. A generic, robust, shareable architecture will result in increased efficiency and transparency of the mapping process, while upholding the integrity of the data. The American Association for Cancer Research's Genomics Evidence Neoplasia Information Exchange (GENIE) collects clinical and genomic data for precision cancer medicine. As part of its commitment to open science, GENIE has partnered with the National Cancer Institute's Genomic Data Commons (GDC) as a secondary repository. After initial efforts to submit data from GENIE to GDC failed, we realized the need for a solution to allow for the iterative mapping of data elements between dynamic classification systems. We developed the Linked Entity Attribute Pair (LEAP) database framework to store and manage the term mappings used to submit data from GENIE to GDC. After creating and populating the LEAP framework, we identified 195 mappings from GENIE to GDC requiring remediation and observed a 28% reduction in effort to resolve these issues, as well as a reduction in inadvertent errors. These results led to a decrease in the time to map between OncoTree, the cancer type ontology used by GENIE, and International Classification of Disease for Oncology, 3rd Edition, used by GDC, from several months to less than 1 week. The LEAP framework provides a streamlined mapping process among various classification systems and allows for reusability so that efforts to create or adjust mappings are straightforward. The ability of the framework to track changes over time streamlines the process to map data elements across various dynamic classification systems.
Identifiants
pubmed: 32755461
doi: 10.1200/CCI.20.00037
pmc: PMC7469618
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
691-699Subventions
Organisme : NCI NIH HHS
ID : P30 CA008748
Pays : United States
Organisme : NCI NIH HHS
ID : P30 CA012197
Pays : United States
Références
Cancer Discov. 2017 Aug;7(8):818-831
pubmed: 28572459
Blood. 2017 Jul 27;130(4):453-459
pubmed: 28600341
J Am Med Inform Assoc. 2007 Jan-Feb;14(1):86-93
pubmed: 17068350
Int J Med Inform. 2007 Nov-Dec;76(11-12):769-79
pubmed: 17098467
JCO Clin Cancer Inform. 2019 Nov;3:1-11
pubmed: 31834820
Am J Epidemiol. 2015 Dec 15;182(12):1033-8
pubmed: 26589709
Nat Genet. 2012 Jan 27;44(2):127-30
pubmed: 22281773