Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset.
SARS-CoV-2
data accuracy
data collection
electronic health records
reference standards
Journal
Journal of the American Medical Informatics Association : JAMIA
ISSN: 1527-974X
Titre abrégé: J Am Med Inform Assoc
Pays: England
ID NLM: 9430800
Informations de publication
Date de publication:
14 06 2022
14 06 2022
Historique:
received:
21
12
2021
revised:
25
03
2022
accepted:
08
04
2022
pubmed:
19
4
2022
medline:
18
6
2022
entrez:
18
4
2022
Statut:
ppublish
Résumé
The goals of this study were to harmonize data from electronic health records (EHRs) into common units, and impute units that were missing. The National COVID Cohort Collaborative (N3C) table of laboratory measurement data-over 3.1 billion patient records and over 19 000 unique measurement concepts in the Observational Medical Outcomes Partnership (OMOP) common-data-model format from 55 data partners. We grouped ontologically similar OMOP concepts together for 52 variables relevant to COVID-19 research, and developed a unit-harmonization pipeline comprised of (1) selecting a canonical unit for each measurement variable, (2) arriving at a formula for conversion, (3) obtaining clinical review of each formula, (4) applying the formula to convert data values in each unit into the target canonical unit, and (5) removing any harmonized value that fell outside of accepted value ranges for the variable. For data with missing units for all the results within a lab test for a data partner, we compared values with pooled values of all data partners, using the Kolmogorov-Smirnov test. Of the concepts without missing values, we harmonized 88.1% of the values, and imputed units for 78.2% of records where units were absent (41% of contributors' records lacked units). The harmonization and inference methods developed herein can serve as a resource for initiatives aiming to extract insight from heterogeneous EHR collections. Unique properties of centralized data are harnessed to enable unit inference. The pipeline we developed for the pooled N3C data enables use of measurements that would otherwise be unavailable for analysis.
Identifiants
pubmed: 35435957
pii: 6569865
doi: 10.1093/jamia/ocac054
pmc: PMC9196692
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
1172-1182Subventions
Organisme : NIGMS NIH HHS
ID : U54 GM104938
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002649
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003167
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001433
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001422
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001860
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM104942
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001420
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002243
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001445
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003096
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002537
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001412
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001872
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001878
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002529
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001863
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002494
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002736
Pays : United States
Organisme : NIGMS National Institute of General Medical Sciences
ID : 5U54GM104942-04
Organisme : NIGMS NIH HHS
ID : U54 GM115516
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002369
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002541
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002001
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002538
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM115458
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001442
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002535
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001866
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001449
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001453
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002489
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM104940
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003107
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003015
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002733
Pays : United States
Organisme : NCATS NIH HHS
ID : U24 TR002306
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002003
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001876
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001436
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002378
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002384
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002553
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002389
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001414
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM104941
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002014
Pays : United States
Organisme : NIH HHS
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002550
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002319
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001855
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001425
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002373
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002240
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002556
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003017
Pays : United States
Organisme : N3C Attribution & Publication Policy v1.2-2020-08-25b
Organisme : NCATS NIH HHS
ID : UL1 TR001439
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001998
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001873
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001881
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002645
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001450
Pays : United States
Organisme : National Center for Advancing Translational Sciences Institute
ID : U24TR002306
Organisme : NCATS NIH HHS
ID : UL1 TR002366
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM115428
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002345
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002377
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM115677
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002544
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003098
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001430
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003142
Pays : United States
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Références
Philos Trans A Math Phys Eng Sci. 2018 Sep 13;376(2128):
pubmed: 30082302
AMIA Annu Symp Proc. 2011;2011:392-401
pubmed: 22195092
J Am Med Inform Assoc. 1998 May-Jun;5(3):276-92
pubmed: 9609498
J Am Med Inform Assoc. 2021 Mar 1;28(3):427-443
pubmed: 32805036
Stud Health Technol Inform. 2015;216:574-8
pubmed: 26262116
Stud Health Technol Inform. 2020 Nov 23;275:234-235
pubmed: 33227779
J Am Med Inform Assoc. 2018 May 1;25(5):614-615
pubmed: 29025119
JAMA Netw Open. 2021 Jul 1;4(7):e2116901
pubmed: 34255046
Stud Health Technol Inform. 2019 Aug 21;264:108-112
pubmed: 31437895
Stud Health Technol Inform. 2020 Jun 16;270:437-442
pubmed: 32570422
J Am Med Inform Assoc. 2018 Feb 1;25(2):192-196
pubmed: 28637208