Combining population-based administrative health records and electronic medical records for disease surveillance.
Administrative data
Electronic medical records
Misclassification bias
Prevalence
Statistical model
Journal
BMC medical informatics and decision making
ISSN: 1472-6947
Titre abrégé: BMC Med Inform Decis Mak
Pays: England
ID NLM: 101088682
Informations de publication
Date de publication:
02 07 2019
02 07 2019
Historique:
received:
22
01
2019
accepted:
20
06
2019
entrez:
4
7
2019
pubmed:
4
7
2019
medline:
18
12
2019
Statut:
epublish
Résumé
Administrative health records (AHRs) and electronic medical records (EMRs) are two key sources of population-based data for disease surveillance, but misclassification errors in the data can bias disease estimates. Methods that combine information from error-prone data sources can build on the strengths of AHRs and EMRs. We compared bias and error for four data-combining methods and applied them to estimate hypertension prevalence. Our study included rule-based OR and AND methods that identify disease cases from either or both data sources, respectively, rule-based sensitivity-specificity adjusted (RSSA) method that corrects for inaccuracies using a deterministic rule, and probabilistic-based sensitivity-specificity adjusted (PSSA) method that corrects for error using a statistical model. Computer simulation was used to estimate relative bias (RB) and mean square error (MSE) under varying conditions of population disease prevalence, correlation amongst data sources, and amount of misclassification error. AHRs and EMRs for Manitoba, Canada were used to estimate hypertension prevalence using validated case definitions and multiple disease markers. The OR method had the lowest RB and MSE when population disease prevalence was 10%, and the RSSA method had the lowest RB and MSE when population prevalence increased to 20%. As the correlation between data sources increased, the OR method resulted in the lowest RB and MSE. Estimates of hypertension prevalence for AHRs and EMRs alone were 30.9% (95% CI: 30.6-31.2) and 24.9% (95% CI: 24.6-25.2), respectively. The estimates were 21.4% (95% CI: 21.1-21.7), for the AND method, 34.4% (95% CI: 34.1-34.8) for the OR method, 32.2% (95% CI: 31.8-32.6) for the RSSA method, and ranged from 34.3% (95% CI: 34.1-34.5) to 35.9% (95% CI, 35.7-36.1) for the PSSA method, depending on the statistical model. The OR and AND methods are influenced by correlation amongst the data sources, while the RSSA method is dependent on the accuracy of prior sensitivity and specificity estimates. The PSSA method performed well when population prevalence was high and average correlations amongst disease markers was low. This study will guide researchers to select a data-combining method that best suits their data characteristics.
Sections du résumé
BACKGROUND
Administrative health records (AHRs) and electronic medical records (EMRs) are two key sources of population-based data for disease surveillance, but misclassification errors in the data can bias disease estimates. Methods that combine information from error-prone data sources can build on the strengths of AHRs and EMRs. We compared bias and error for four data-combining methods and applied them to estimate hypertension prevalence.
METHODS
Our study included rule-based OR and AND methods that identify disease cases from either or both data sources, respectively, rule-based sensitivity-specificity adjusted (RSSA) method that corrects for inaccuracies using a deterministic rule, and probabilistic-based sensitivity-specificity adjusted (PSSA) method that corrects for error using a statistical model. Computer simulation was used to estimate relative bias (RB) and mean square error (MSE) under varying conditions of population disease prevalence, correlation amongst data sources, and amount of misclassification error. AHRs and EMRs for Manitoba, Canada were used to estimate hypertension prevalence using validated case definitions and multiple disease markers.
RESULTS
The OR method had the lowest RB and MSE when population disease prevalence was 10%, and the RSSA method had the lowest RB and MSE when population prevalence increased to 20%. As the correlation between data sources increased, the OR method resulted in the lowest RB and MSE. Estimates of hypertension prevalence for AHRs and EMRs alone were 30.9% (95% CI: 30.6-31.2) and 24.9% (95% CI: 24.6-25.2), respectively. The estimates were 21.4% (95% CI: 21.1-21.7), for the AND method, 34.4% (95% CI: 34.1-34.8) for the OR method, 32.2% (95% CI: 31.8-32.6) for the RSSA method, and ranged from 34.3% (95% CI: 34.1-34.5) to 35.9% (95% CI, 35.7-36.1) for the PSSA method, depending on the statistical model.
CONCLUSIONS
The OR and AND methods are influenced by correlation amongst the data sources, while the RSSA method is dependent on the accuracy of prior sensitivity and specificity estimates. The PSSA method performed well when population prevalence was high and average correlations amongst disease markers was low. This study will guide researchers to select a data-combining method that best suits their data characteristics.
Identifiants
pubmed: 31266516
doi: 10.1186/s12911-019-0845-5
pii: 10.1186/s12911-019-0845-5
pmc: PMC6604278
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
120Subventions
Organisme : CIHR
ID : 143293
Pays : Canada
Références
J Clin Epidemiol. 2010 Jul;63(7):721-7
pubmed: 20338724
Stat Med. 2005 Aug 15;24(15):2365-79
pubmed: 15977290
Eur J Prev Cardiol. 2013 Apr;20(2):254-9
pubmed: 22345696
Pulm Circ. 2018 Apr-Jun;8(2):2045894018759246
pubmed: 29480064
J Clin Epidemiol. 2009 Jun;62(6):660-6
pubmed: 19070463
Int J Cardiol. 2016 Jul 15;215:277-82
pubmed: 27128546
BMC Public Health. 2013 Jan 09;13:15
pubmed: 23297821
Can J Cardiol. 2017 Aug;33(8):1052-1059
pubmed: 28754391
Soc Sci Med. 2010 Mar;70(6):844-9
pubmed: 20079563
BMC Fam Pract. 2015 Feb 05;16:11
pubmed: 25649201
BMJ Open. 2016 Nov 18;6(11):e012832
pubmed: 27864249
J Am Board Fam Med. 2013 Mar-Apr;26(2):159-67
pubmed: 23471929
J Am Med Inform Assoc. 2016 Nov;23(6):1107-1112
pubmed: 27107454
J Biopharm Stat. 2018;28(5):951-965
pubmed: 29355450
Hypertension. 2009 Dec;54(6):1423-8
pubmed: 19858407
Osteoporos Int. 2011 Jun;22(6):1873-83
pubmed: 20967422
Ann Fam Med. 2014 Jul;12(4):367-72
pubmed: 25024246
J Clin Microbiol. 2004 Oct;42(10):4749-58
pubmed: 15472336
Biometrics. 2001 Mar;57(1):158-67
pubmed: 11252592
J Clin Epidemiol. 2009 Aug;62(8):797-806
pubmed: 19447581
Stat Med. 2016 Apr 30;35(9):1454-70
pubmed: 26555849
BMC Public Health. 2013 Jan 09;13:16
pubmed: 23302258
BMC Public Health. 2014 Nov 07;14:1157
pubmed: 25377723
Clin Vaccine Immunol. 2008 Jan;15(1):106-14
pubmed: 17989336
J Public Health (Oxf). 2016 Sep;38(3):e392-e399
pubmed: 26547088
PLoS One. 2013 Jul 05;8(7):e67370
pubmed: 23861760
Can J Cardiol. 2013 May;29(5):606-12
pubmed: 23395221
Med Care. 2005 Nov;43(11):1130-9
pubmed: 16224307
J Am Med Inform Assoc. 2007 Jan-Feb;14(1):10-5
pubmed: 17068349
Malar J. 2015 Nov 04;14:434
pubmed: 26537373
BMC Public Health. 2014 Feb 11;14:147
pubmed: 24517715
Hypertens Pregnancy. 2008;27(3):285-97
pubmed: 18696357
Health Policy. 2016 Sep;120(9):1061-9
pubmed: 27460939
PLoS One. 2015 Mar 24;10(3):e0119186
pubmed: 25803682
CMAJ Open. 2015 Jan 13;3(1):E76-82
pubmed: 25844373
Can J Cardiol. 2013 Nov;29(11):1462-9
pubmed: 23916738
BMC Public Health. 2013 Jan 18;13:51
pubmed: 23331960
Stat Med. 1999 Nov 30;18(22):2987-3003
pubmed: 10544302
Open Med. 2007 Apr 14;1(1):e18-26
pubmed: 20101286
Can J Cardiol. 2016 May;32(5):687-94
pubmed: 26711315
Epidemiology. 2005 Sep;16(5):604-12
pubmed: 16135935
CMAJ. 2012 Jan 10;184(1):E49-56
pubmed: 22105752
J Clin Epidemiol. 2002 Apr;55(4):386-91
pubmed: 11927207
Stat Med. 2014 Sep 20;33(21):3710-24
pubmed: 24804628
Commun Stat Theory Methods. 2016;45(9):2538-2555
pubmed: 27293307
PLoS One. 2017 Oct 30;12(10):e0187240
pubmed: 29084293
Soc Sci Med. 1997 Aug;45(3):383-97
pubmed: 9232733