Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network.

CDRN Checks Data Quality Electronic Health Records GitHub Issues

Journal

EGEMS (Washington, DC)
ISSN: 2327-9214
Titre abrégé: EGEMS (Wash DC)
Pays: England
ID NLM: 101629606

Informations de publication

Date de publication:
01 Aug 2019
Historique:
entrez: 19 9 2019
pubmed: 19 9 2019
medline: 19 9 2019
Statut: epublish

Résumé

Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs. Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking. During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment. In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research.

Sections du résumé

BACKGROUND BACKGROUND
Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs.
IMPLEMENTATION METHODS
Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking.
RESULTS RESULTS
During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment.
CONCLUSIONS CONCLUSIONS
In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research.

Identifiants

pubmed: 31531382
doi: 10.5334/egems.294
pmc: PMC6676917
doi:

Types de publication

Journal Article

Langues

eng

Pagination

36

Déclaration de conflit d'intérêts

The authors have no competing interests to declare.

Références

J Am Med Inform Assoc. 2002 Nov-Dec;9(6):600-11
pubmed: 12386111
Pharmacoepidemiol Drug Saf. 2012 Jan;21 Suppl 1:23-31
pubmed: 22262590
Med Care. 2012 Jul;50 Suppl:S21-9
pubmed: 22692254
J Am Med Inform Assoc. 2013 Jan 1;20(1):144-51
pubmed: 22733976
Med Care. 2013 Aug;51(8 Suppl 3):S22-9
pubmed: 23793049
PLoS One. 2013 Jun 18;8(6):e66192
pubmed: 23823186
AMIA Jt Summits Transl Sci Proc. 2013 Mar 18;2013:86-8
pubmed: 24303241
Pharmacoepidemiol Drug Saf. 2014 Jun;23(6):609-18
pubmed: 24677577
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):602-6
pubmed: 24821737
EGEMS (Wash DC). 2016 Sep 11;4(1):1244
pubmed: 27713905
EGEMS (Wash DC). 2016 Nov 30;4(1):1239
pubmed: 28154833
J Am Med Inform Assoc. 2017 Nov 1;24(6):1072-1079
pubmed: 28398525
EGEMS (Wash DC). 2017 Jun 12;5(1):8
pubmed: 29881733
EGEMS (Wash DC). 2018 Apr 13;6(1):3
pubmed: 29881761
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:113-121
pubmed: 29888053

Auteurs

Ritu Khare (R)

The Children's Hospital of Philadelphia, US.

Levon H Utidjian (LH)

The Children's Hospital of Philadelphia, US.

Hanieh Razzaghi (H)

The Children's Hospital of Philadelphia, US.

Victoria Soucek (V)

Seattle Children's Hospital, US.

Evanette Burrows (E)

The Children's Hospital of Philadelphia, US.

Daniel Eckrich (D)

Nemours Children's Health System, US.

Richard Hoyt (R)

Nationwide Children's Hospital, US.

Harris Weinstein (H)

The Children's Hospital of Philadelphia, US.

Matthew W Miller (MW)

The Children's Hospital of Philadelphia, US.

David Soler (D)

The Children's Hospital of Philadelphia, US.

Joshua Tucker (J)

The Children's Hospital of Philadelphia, US.

L Charles Bailey (LC)

The Children's Hospital of Philadelphia, US.

Classifications MeSH