Methods to improve the quality of smoking records in a primary care EMR database: exploring multiple imputation and pattern-matching algorithms.
Electronic medical records
Primary health care
Public health informatics
Smoking
Journal
BMC medical informatics and decision making
ISSN: 1472-6947
Titre abrégé: BMC Med Inform Decis Mak
Pays: England
ID NLM: 101088682
Informations de publication
Date de publication:
14 03 2020
14 03 2020
Historique:
received:
21
11
2019
accepted:
28
02
2020
entrez:
16
3
2020
pubmed:
17
3
2020
medline:
6
10
2020
Statut:
epublish
Résumé
Primary care electronic medical record (EMR) data are emerging as a useful source for secondary uses, such as disease surveillance, health outcomes research, and practice improvement. These data capture clinical details about patients' health status, as well as behavioural risk factors, such as smoking. While the importance of documenting smoking status in a healthcare setting is recognized, the quality of smoking data captured in EMRs is variable. This study was designed to test methods aimed at improving the quality of patient smoking information in a primary care EMR database. EMR data from community primary care settings extracted by two regional practice-based research networks in Alberta, Canada were used. Patients with at least one encounter in the previous 2 years (2016-2018) and having hypertension according to a validated definition were included (n = 48,377). Multiple imputation was tested under two different assumptions for missing data (smoking status is missing at random and missing not-at-random). A third method tested a novel pattern matching algorithm developed to augment smoking information in the primary care EMR database. External validity was examined by comparing the proportions of smoking categories generated in each method with a general population survey. Among those with hypertension, 40.8% (n = 19,743) had either no smoking information recorded or it was not interpretable and considered missing. Those with missing smoking data differed statistically by demographics, clinical features, and type of EMR system used in the clinic. Both multiple imputation methods produced fully complete smoking status information, with the proportion of current smokers estimated at 25.3% (data missing at random) and 12.5% (data missing not-at-random). The pattern-matching algorithm classified 18.2% of patients as current smokers, similar to the population-based survey (18.9%), but still resulted in missing smoking information for 23.6% of patients. The algorithm was estimated to be 93.8% accurate overall, but varied by smoking status category. Multiple imputation and algorithmic pattern-matching can be used to improve EMR data post-extraction but the recommended method depends on the purpose of secondary use (e.g. practice improvement or epidemiological analyses).
Sections du résumé
BACKGROUND
Primary care electronic medical record (EMR) data are emerging as a useful source for secondary uses, such as disease surveillance, health outcomes research, and practice improvement. These data capture clinical details about patients' health status, as well as behavioural risk factors, such as smoking. While the importance of documenting smoking status in a healthcare setting is recognized, the quality of smoking data captured in EMRs is variable. This study was designed to test methods aimed at improving the quality of patient smoking information in a primary care EMR database.
METHODS
EMR data from community primary care settings extracted by two regional practice-based research networks in Alberta, Canada were used. Patients with at least one encounter in the previous 2 years (2016-2018) and having hypertension according to a validated definition were included (n = 48,377). Multiple imputation was tested under two different assumptions for missing data (smoking status is missing at random and missing not-at-random). A third method tested a novel pattern matching algorithm developed to augment smoking information in the primary care EMR database. External validity was examined by comparing the proportions of smoking categories generated in each method with a general population survey.
RESULTS
Among those with hypertension, 40.8% (n = 19,743) had either no smoking information recorded or it was not interpretable and considered missing. Those with missing smoking data differed statistically by demographics, clinical features, and type of EMR system used in the clinic. Both multiple imputation methods produced fully complete smoking status information, with the proportion of current smokers estimated at 25.3% (data missing at random) and 12.5% (data missing not-at-random). The pattern-matching algorithm classified 18.2% of patients as current smokers, similar to the population-based survey (18.9%), but still resulted in missing smoking information for 23.6% of patients. The algorithm was estimated to be 93.8% accurate overall, but varied by smoking status category.
CONCLUSION
Multiple imputation and algorithmic pattern-matching can be used to improve EMR data post-extraction but the recommended method depends on the purpose of secondary use (e.g. practice improvement or epidemiological analyses).
Identifiants
pubmed: 32171301
doi: 10.1186/s12911-020-1068-5
pii: 10.1186/s12911-020-1068-5
pmc: PMC7071570
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
56Subventions
Organisme : CIHR
Pays : Canada
Références
Ann Fam Med. 2014 Jul;12(4):367-72
pubmed: 25024246
J Med Internet Res. 2018 May 29;20(5):e185
pubmed: 29844010
Am J Prev Med. 2015 Aug;49(2):264-8
pubmed: 25997907
Med Care. 2010 Feb;48(2):175-82
pubmed: 19927013
BMJ. 2015 Apr 24;350:h1885
pubmed: 25911572
Can J Public Health. 2017 Sep 14;108(3):e331-e334
pubmed: 28910259
Int J Epidemiol. 2017 Aug 1;46(4):1091-1092f
pubmed: 28338877
Transl Behav Med. 2017 Jun;7(2):148-156
pubmed: 27800564
Inform Prim Care. 2011;19(4):241-50
pubmed: 22828579
BMC Public Health. 2012 Jul 10;12:329
pubmed: 22559290
Clin Epidemiol. 2017 Mar 15;9:157-166
pubmed: 28352203
CMAJ. 2018 Oct 9;190(40):E1192-E1206
pubmed: 30301743
BMJ Open. 2014 Apr 23;4(4):e004958
pubmed: 24760355
Health Rep. 2018 Jun 20;29(6):3-10
pubmed: 29924373
Pharmacoepidemiol Drug Saf. 2010 Jun;19(6):618-26
pubmed: 20306452
Int J Med Inform. 2015 Dec;84(12):1094-8
pubmed: 26480872
J Am Med Inform Assoc. 2017 Jan;24(1):81-87
pubmed: 27274019