The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis.

Algorithms Electronic health records Review Routinely collected health data Validation study

Journal

BMC medical informatics and decision making

ISSN: 1472-6947

Titre abrégé: BMC Med Inform Decis Mak

Pays: England

ID NLM: 101088682

Informations de publication

Date de publication:
02 Feb 2024

Historique:

received: 19 06 2023

accepted: 03 01 2024

medline: 3 2 2024

pubmed: 3 2 2024

entrez: 2 2 2024

Statut: epublish

Résumé

Smoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation studies have often focused on chronic diseases rather than risk factors. We conducted a systematic review and meta-analysis of smoking status ascertainment algorithms to describe the characteristics and validity of these algorithms. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed. We searched articles published from 1990 to 2022 in EMBASE, MEDLINE, Scopus, and Web of Science with key terms such as validity, administrative data, electronic health records, smoking, and tobacco use. The extracted information, including article characteristics, algorithm characteristics, and validity measures, was descriptively analyzed. Sources of heterogeneity in validity measures were estimated using a meta-regression model. Risk of bias (ROB) in the reviewed articles was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. The initial search yielded 2086 articles; 57 were selected for review and 116 algorithms were identified. Almost three-quarters (71.6%) of algorithms were based on EMR data. The algorithms were primarily constructed using diagnosis codes for smoking-related conditions, although prescription medication codes for smoking treatments were also adopted. About half of the algorithms were developed using machine-learning models. The pooled estimates of positive predictive value, sensitivity, and specificity were 0.843, 0.672, and 0.918 respectively. Algorithm sensitivity and specificity were highly variable and ranged from 3 to 100% and 36 to 100%, respectively. Model-based algorithms had significantly greater sensitivity (p = 0.006) than rule-based algorithms. Algorithms for EMR data had higher sensitivity than algorithms for administrative data (p = 0.001). The ROB was low in most of the articles (76.3%) that underwent the assessment. Multiple algorithms using different data sources and methods have been proposed to ascertain smoking status in electronic health data. Many algorithms had low sensitivity and positive predictive value, but the data source influenced their validity. Algorithms based on machine-learning models for multiple linked data sources have improved validity.

Sections du résumé

BACKGROUND BACKGROUND

METHODS METHODS

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed. We searched articles published from 1990 to 2022 in EMBASE, MEDLINE, Scopus, and Web of Science with key terms such as validity, administrative data, electronic health records, smoking, and tobacco use. The extracted information, including article characteristics, algorithm characteristics, and validity measures, was descriptively analyzed. Sources of heterogeneity in validity measures were estimated using a meta-regression model. Risk of bias (ROB) in the reviewed articles was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool.

RESULTS RESULTS

The initial search yielded 2086 articles; 57 were selected for review and 116 algorithms were identified. Almost three-quarters (71.6%) of algorithms were based on EMR data. The algorithms were primarily constructed using diagnosis codes for smoking-related conditions, although prescription medication codes for smoking treatments were also adopted. About half of the algorithms were developed using machine-learning models. The pooled estimates of positive predictive value, sensitivity, and specificity were 0.843, 0.672, and 0.918 respectively. Algorithm sensitivity and specificity were highly variable and ranged from 3 to 100% and 36 to 100%, respectively. Model-based algorithms had significantly greater sensitivity (p = 0.006) than rule-based algorithms. Algorithms for EMR data had higher sensitivity than algorithms for administrative data (p = 0.001). The ROB was low in most of the articles (76.3%) that underwent the assessment.

CONCLUSIONS CONCLUSIONS

Multiple algorithms using different data sources and methods have been proposed to ascertain smoking status in electronic health data. Many algorithms had low sensitivity and positive predictive value, but the data source influenced their validity. Algorithms based on machine-learning models for multiple linked data sources have improved validity.

Identifiants

DOI: 10.1186/s12911-024-02416-3 PMID: 38308231

pubmed: 38308231

doi: 10.1186/s12911-024-02416-3

pii: 10.1186/s12911-024-02416-3

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M, Michel A. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106:1–9.

pubmed: 27557678 doi: 10.1007/s00392-016-1025-6

Lee S, Xu Y, D'Souza AG, Martin EA, Doktorchik C, Zhang Z, Quan H. Unlocking the potential of electronic health records for health research. Int J Popul Data Sci. 2020;5(1):1123.

Kierkegaard P. Electronic health record: wiring Europe’s healthcare. Comput Law Secur Rev. 2011;27(5):503–15.

doi: 10.1016/j.clsr.2011.07.013

Harbaugh CM, Cooper JN. Administrative databases. Semin Pediatr Surg. 2018;27(6):353–60.

pubmed: 30473039 doi: 10.1053/j.sempedsurg.2018.10.001

World Health Organization. Tobacco fact sheet from WHO providing key facts and information on surveillance. https://www.who.int/news-room/fact-sheets/detail/tobacco . Accessed 10 Apr 2022.

Canadian Lung Association. Smoking and tobacco statistics. https://www.lung.ca/lung-health/lung-info/lung-statistics/smoking-and-tobacco-statistics . Accessed 10 Apr 2022.

Barrett JK, Sweeting MJ, Wood AM. Dynamic risk prediction for cardiovascular disease: an illustration using the ARIC study, vol. 36. Handbook of Statistics; 2017. p. 47–65.

Kelsey JL, Kelsey C, Whittemore AS, Whittemore P, Evans AS, Thompson WD, et al. Methods in observational epidemiology. Oxford University Press; 1996. p. 458.

Desai RJ, Solomon DH, Shadick N, Iannaccone C, Kim SC. Identification of smoking using Medicare data—a validation study of claims-based algorithms. Pharmacoepidemiol Drug Saf. 2016;25(4):472–5.

pubmed: 26764576 pmcid: 4826837 doi: 10.1002/pds.3953

Chen LH, Quinn V, Xu L, Gould MK, Jacobsen SJ, Koebnick C, Reynolds K, Hechter RC, Chao CR. The accuracy and trends of smoking history documentation in electronic medical records in a large managed care organization. Subst Use Misuse. 2013;48(9):731–42.

pubmed: 23621678 doi: 10.3109/10826084.2013.787095

Chowdhury M, Cervantes EG, Chan WY, Seitz DP. Use of machine learning and artificial intelligence methods in geriatric mental health research involving electronic health record or administrative claims data: a systematic review. Front Psychiatry . 2021;12:738466.

pubmed: 34616322 pmcid: 8488098 doi: 10.3389/fpsyt.2021.738466

Groenhof TK, Koers LR, Blasse E, de Groot M, Grobbee DE, Bots ML, Asselbergs FW, Lely AT, Haitjema S, van Solinge W, Hoefer I. Data mining information from electronic health records produced high yield and accuracy for current smoking status. J Clin Epidemiol. 2020;118:100–6.

pubmed: 31730918 doi: 10.1016/j.jclinepi.2019.11.006

Yadav P, Steinbach M, Kumar V, Simon G. Mining electronic health records (EHRs): a survey. ACM Comput Surv. 2018;50(6):1–40.

doi: 10.1145/3127881

Caldwell PH, Bennett T. Easy guide to conducting a systematic review. J Paediatr Child Health. 2020;56(6):853–6.

pubmed: 32364273 doi: 10.1111/jpc.14853

Deeks JJ, Higgins JP, Altman DG, Cochrane Statistical Methods Group. Analysing data and undertaking meta-analyses. In: Cochrane handbook for systematic reviews of interventions. John Wiley & Sons, Ltd; 2019. p. 241–84.

doi: 10.1002/9781119536604.ch10

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: Elaboration and explanation. BMJ. 2015;349:g7647.

PRISMA Statement organization. PRISMA Endorsers http://www.prismastatement.org/Endorsement/PRISMAEndorsers?AspxAutoDetectCookieSupport=1 . Accessed 16 May 2023.

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5:1–10.

doi: 10.1186/s13643-016-0384-4

Belur J, Tompson L, Thornton A, Simon M. Interrater reliability in systematic review methodology: exploring variation in coder decision-making. Sociol Methods Res. 2021;50(2):837–65.

doi: 10.1177/0049124118799372

McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22(3):276–82.

doi: 10.11613/BM.2012.031

Lange RT. Inter-rater reliability. In: Kreutzer JS, DeLuca J, Caplan B, editors. Encyclopedia of clinical neuropsychology. New York, NY: Springer; 2011. p. 1348.

doi: 10.1007/978-0-387-79948-3_1203

Feely A, Lim LS, Jiang D, Lix LM. A population-based study to develop juvenile arthritis case definitions for administrative health data using model-based dynamic classification. BMC Med Res Methodol. 2021;21(1):1–3.

doi: 10.1186/s12874-021-01296-9

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, Lijmer JG, Moher D, Rennie D, De Vet HC, Kressel HY. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Clin Chem. 2015;61(12):1446–52.

pubmed: 26510957 doi: 10.1373/clinchem.2015.246280

Weisz JR, Kuppens S, Ng MY, Eckshtain D, Ugueto AM, Vaughn-Coaxum R, Jensen-Doss A, Hawley KM, Krumholz Marchette LS, Chu BC, Weersing VR. What five decades of research tells us about the effects of youth psychological therapy: a multilevel meta-analysis and implications for science and practice. Am Psychol. 2017;72(2):79.

pubmed: 28221063 doi: 10.1037/a0040360

Wallis S. Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. J Quant Linguist. 2013;20(3):178–208.

doi: 10.1080/09296174.2013.799918

Glover S, Dixon P. Likelihood ratios: a simple and flexible statistic for empirical psychologists. Psychon Bull Rev. 2004;11(5):791–806.

pubmed: 15732688 doi: 10.3758/BF03196706

Wang Y, Sohn S, Liu S, Shen F, Wang L, Atkinson EJ, Amin S, Liu H. A clinical text classification paradigm using weak supervision and deep representation. BMC Medical Inform Decis Mak. 2019;19:1–3.

doi: 10.1186/s12911-018-0723-6

Harrer M, Cuijpers P, Furukawa TA, Ebert DD. Doing meta-analysis with R: a hands-on guide. CRC Press; 2021.

doi: 10.1201/9781003107347

Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48.

doi: 10.18637/jss.v036.i03

Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MM, Sterne JA, Bossuyt PM, QUADAS-2 Group. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.

pubmed: 22007046 doi: 10.7326/0003-4819-155-8-201110180-00009

Doleman B, Freeman SC, Lund JN, Williams JP, Sutton AJ. Funnel plots may show asymmetry in the absence of publication bias with continuous outcomes dependent on baseline risk: presentation of a new publication bias test. Res Synth Methods. 2020;11(4):522–34.

pubmed: 32362052 doi: 10.1002/jrsm.1414

Chung WS, Kung PT, Chang HY, Tsai WC. Demographics and medical disorders associated with smoking: a population-based study. BMC Public Health. 2020;20:1–8.

doi: 10.1186/s12889-020-08858-4

Wang L, Ruan X, Yang P, Liu H. Comparison of three information sources for smoking information in electronic health records. Cancer Informat. 2016;15:CIN-S40604.

doi: 10.4137/CIN.S40604

Harris DR, Henderson DW, Corbeau A. Improving the utility of tobacco-related problem list entries using natural language processing. In: In: American Medical Informatics Association Annual Symposium Proceedings; 2020. p. 534.

Regan S, Meigs JB, Grinspoon SK, Triant VA. Determinants of smoking and quitting in HIV-infected individuals. PLoS One. 2016;11(4):e0153103.

pubmed: 27099932 pmcid: 4839777 doi: 10.1371/journal.pone.0153103

Melzer AC, Pinsker EA, Clothier B, Noorbaloochi S, Burgess DJ, Danan ER, Fu SS. Validating the use of veterans affairs tobacco health factors for assessing change in smoking status: accuracy, availability, and approach. BMC Med Res Methodol. 2018;18:1–10.

doi: 10.1186/s12874-018-0501-2

Huo J, Yang M, Shih YC. Sensitivity of claims-based algorithms to ascertain smoking status more than doubled with meaningful use. Value Health. 2018;21(3):334–40.

pubmed: 29566841 doi: 10.1016/j.jval.2017.09.002

Luck J, Larson AE, Tong VT, Yoon J, Oakley LP, Harvey SM. Tobacco use by pregnant Medicaid beneficiaries: validating a claims-based measure in Oregon. Prev Med Rep. 2020;19:101039.

pubmed: 32435578 pmcid: 7229484 doi: 10.1016/j.pmedr.2019.101039

Etzioni DA, Lessow C, Bordeianou LG, Kunitake H, Deery SE, Carchman E, Papageorge CM, Fuhrman G, Seiler RL, Ogilvie J, Habermann EB. Concordance between registry and administrative data in the determination of comorbidity: a multi-institutional study. Ann Surg. 2020;272(6):1006–11.

pubmed: 30817356 doi: 10.1097/SLA.0000000000003247

McVeigh KH, Lurie-Moroni E, Chan PY, Newton-Dame R, Schreibstein L, Tatem KS, Romo ML, Thorpe LE, Perlman SE. Generalizability of indicators from the New York city macroscope electronic health record surveillance system to systems based on other EHR platforms. eGEMs. 2017;5(1):25.

Marrie RA, Tan Q, Ekuma O, Marriott JJ. Development of an indicator of smoking status for people with multiple sclerosis in administrative data. Mult Scler J–Exp, Transl Clin. 2022;8(1):20552173221074296.

Floyd JS, Blondon M, Moore KP, Boyko EJ, Smith NL. Validation of methods for assessing cardiovascular disease using electronic health data in a cohort of veterans with diabetes. Pharmacoepidemiol Drug Saf. 2016;25(4):467–71.

pubmed: 26555025 doi: 10.1002/pds.3921

Calhoun PS, Wilson SM, Hertzberg JS, Kirby AC, McDonald SD, Dennis PA, Bastian LA, Dedert EA, Mid-Atlantic VA, Workgroup MIRECC, Beckham JC. Validation of veterans affairs electronic medical record smoking data among Iraq-and Afghanistan-era veterans. J Gen Intern Med. 2017;32:1228–34.

pubmed: 28808856 pmcid: 5653558 doi: 10.1007/s11606-017-4144-5

Mu Y, Chin AI, Kshirsagar AV, Bang H. Data concordance between ESRD medical evidence report and Medicare claims: is there any improvement? PeerJ. 2018;6:e5284.

pubmed: 30065880 pmcid: 6065459 doi: 10.7717/peerj.5284

LeLaurin JH, Gurka MJ, Chi X, Lee JH, Hall J, Warren GW, Salloum RG. Concordance between electronic health record and tumor registry documentation of smoking status among patients with cancer. JCO Clin Cancer Inform. 2021;5:518–26.

pubmed: 33974447 doi: 10.1200/CCI.20.00187

Caccamisi A, Jørgensen L, Dalianis H, Rosenlund M. Natural language processing and machine learning to enable automatic extraction and classification of patients’ smoking status from electronic medical records. Ups J Med Sci. 2020;125(4):316–24.

pubmed: 32696698 pmcid: 7594865 doi: 10.1080/03009734.2020.1792010

Palmer EL, Higgins J, Hassanpour S, Sargent J, Robinson CM, Doherty JA, Onega T. Assessing data availability and quality within an electronic health record system through external validation against an external clinical data source. BMC Medical Inform Decis Mak. 2019;19(1):1–9.

doi: 10.1186/s12911-019-0864-2

Golden SE, Hooker ER, Shull S, Howard M, Crothers K, Thompson RF, Slatore CG. Validity of veterans health administration structured data to determine accurate smoking status. Health Inform J. 2020;26(3):1507–15.

doi: 10.1177/1460458219882259

Atkinson MD, Kennedy JI, John A, Lewis KE, Lyons RA, Brophy ST. Development of an algorithm for determining smoking status and behaviour over the life course from UK electronic primary care records. BMC Medical Inform Decis Mak. 2017;17(1):1–2.

doi: 10.1186/s12911-016-0400-6

Reps JM, Rijnbeek PR, Ryan PB. Supplementing claims data analysis using self-reported data to develop a probabilistic phenotype model for current smoking status. J Biomed Inform. 2019;97:103264.

pubmed: 31386904 doi: 10.1016/j.jbi.2019.103264

Ni Y, Bachtel A, Nause K, Beal S. Automated detection of substance use information from electronic health records for a pediatric population. J Am Med Inform Assoc. 2021;28(10):2116–27.

pubmed: 34333636 pmcid: 8449626 doi: 10.1093/jamia/ocab116

Khalifa A, Meystre S. Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes. J Biomed Inform. 2015;58:S128–32.

pubmed: 26318122 pmcid: 4983192 doi: 10.1016/j.jbi.2015.08.002

Urbain J. Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models. J Biomed Inform. 2015;58:S143–9.

pubmed: 26305514 pmcid: 4984540 doi: 10.1016/j.jbi.2015.08.009

McVeigh KH, Newton-Dame R, Chan PY, Thorpe LE, Schreibstein L, Tatem KS, Chernov C, Lurie-Moroni E, Perlman SE. Can electronic health records be used for population health surveillance? Validating population health metrics against established survey data. eGEMs. 2016;4(1):1267.

Roberts K, Shooshan SE, Rodriguez L, Abhyankar S, Kilicoglu H, Demner-Fushman D. The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs. J Biomed Inform. 2015;58:S111–9.

pubmed: 26122527 pmcid: 4988795 doi: 10.1016/j.jbi.2015.06.010

Gauthier MP, Law JH, Le LW, Li JJ, Zahir S, Nirmalakumar S, Sung M, Pettengell C, Aviv S, Chu R, Sacher A. Automating access to real-world evidence. JTO Clin Res Rep. 2022;3(6):100340.

pubmed: 35719866 pmcid: 9201015

O’Brien EC, Mulder H, Jones WS, Hammill BG, Sharlow A, Hernandez AF, Curtis LH. Concordance between patient-reported health data and electronic health data in the ADAPTABLE trial. JAMA Cardiol. 2022;7(12):1235–43.

pubmed: 36322059 pmcid: 9631224 doi: 10.1001/jamacardio.2022.3844

Alhaug OK, Kaur S, Dolatowski F, Småstuen MC, Solberg TK, Lønne G. Accuracy and agreement of national spine register data for 474 patients compared to corresponding electronic patient records. Eur Spine J. 2022;31(3):801–11.

pubmed: 34989877 doi: 10.1007/s00586-021-07093-8

Teng A, Wilcox A. Simplified data science approach to extract social and behavioural determinants: a retrospective chart review. BMJ Open. 2022;12(1):e048397.

McGinnis KA, Skanderson M, Justice AC, Tindle HA, Akgün KM, Wrona A, Freiberg MS, Goetz MB, Rodriguez-Barradas MC, Brown ST, Crothers KA. Using the biomarker cotinine and survey self-report to validate smoking data from United States veterans health administration electronic health records. JAMIA Open. 2022;5(2):ooac040.

McGinnis KA, Justice AC, Tate JP, Kranzler HR, Tindle HA, Becker WC, Concato J, Gelernter J, Li B, Zhang X, Zhao H. Using DNA methylation to validate an electronic medical record phenotype for smoking. Addict Biol. 2019;24(5):1056–65.

pubmed: 30284751 doi: 10.1111/adb.12670

Maier B, Wagner K, Behrens S, Bruch L, Busse R, Schmidt D, Schühlen H, Thieme R, Theres H. Comparing routine administrative data with registry data for assessing quality of hospital care in patients with myocardial infarction using deterministic record linkage. BMC Health Serv Res. 2016;16(1):1–9.

doi: 10.1186/s12913-016-1840-5

Nickel KB, Wallace AE, Warren DK, Ball KE, Mines D, Fraser VJ, Olsen MA. Modification of claims-based measures improves identification of comorbidities in non-elderly women undergoing mastectomy for breast cancer: a retrospective cohort study. BMC Health Serv Res. 2016;16:1–2.

doi: 10.1186/s12913-016-1636-7

Havard A, Jorm LR, Lujic S. Risk adjustment for smoking identified through tobacco use diagnoses in hospital data: a validation study. PLoS One. 2014;9(4):e95029.

Lujic S, Watson DE, Randall DA, Simpson JM, Jorm LR. Variation in the recording of common health conditions in routine hospital data: study using linked survey and administrative data in New South Wales, Australia. BMJ Open. 2014;4(9):e005768.

Wiley LK, Shah A, Xu H, Bush WS. ICD-9 tobacco use codes are effective identifiers of smoking status. J Am Med Inform Assoc. 2013;20(4):652–8.

pubmed: 23396545 pmcid: 3721171 doi: 10.1136/amiajnl-2012-001557

McGinnis KA, Brandt CA, Skanderson M, Justice AC, Shahrir S, Butt AA, Brown ST, Freiberg MS, Gibert CL, Goetz MB, Kim JW. Validating smoking data from the Veteran’s affairs health factors dataset, an electronic data source. Nicotine Tob Res. 2011;13(12):1233–9.

pubmed: 21911825 pmcid: 3223583 doi: 10.1093/ntr/ntr206

Kim HM, Smith EG, Stano CM, Ganoczy D, Zivin K, Walters H, Valenstein M. Validation of key behaviourally based mental health diagnoses in administrative data: suicide attempt, alcohol abuse, illicit drug abuse and tobacco use. BMC Health Serv Res. 2012;12(1):1–9.

doi: 10.1186/1472-6963-12-18

Lee JD, Delbanco B, Wu E, Gourevitch MN. Substance use prevalence and screening instrument comparisons in urban primary care. Subst Abus. 2011;32(3):128–34.

pubmed: 21660872 doi: 10.1080/08897077.2011.562732

Jollis JG, Ancukiewicz M, DeLong ER, Pryor DB, Muhlbaier LH, Mark DB. Discordance of databases designed for claims payment versus clinical information systems: implications for outcomes research. Ann Intern Med. 1993;119(8):844–50.

pubmed: 8018127 doi: 10.7326/0003-4819-119-8-199310150-00011

Steffen MW, Murad MH, Hays JT, Newcomb RD, Molella RG, Cha SS, Hagen PT. Self-report of tobacco use status: comparison of paper-based questionnaire, online questionnaire, and direct face-to-face interview—implications for meaningful use. Popul Health Manag. 2014;17(3):185–9.

pubmed: 24476559 pmcid: 4442565 doi: 10.1089/pop.2013.0051

Borzecki AM, Wong AT, Hickey EC, Ash AS, Berlowitz DR. Identifying hypertension-related comorbidities from administrative data: what's the optimal approach? Am J Med Qual. 2004;19(5):201–6.

pubmed: 15532912 doi: 10.1177/106286060401900504

Bui DD, Zeng-Treitler Q. Learning regular expressions for clinical text classification. J Am Med Inform Assoc. 2014;21(5):850–7.

pubmed: 24578357 pmcid: 4147608 doi: 10.1136/amiajnl-2013-002411

Khor R, Yip WK, Bressel M, Rose W, Duchesne G, Foroudi F. Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements. J Am Med Inform Assoc. 2014;21(1):27–30.

pubmed: 23921192 doi: 10.1136/amiajnl-2013-002090

DeJoy S, Pekow P, Bertone-Johnson E, Chasan-Taber L. Validation of a certified nurse-midwifery database for use in quality monitoring and outcomes research. J Midwifery Womens Health. 2014;59(4):438–46.

pubmed: 24890854 doi: 10.1111/jmwh.12107

Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Medical Inform Decis Mak. 2006;6(1):1–9.

doi: 10.1186/1472-6947-6-30

Longenecker JC, Coresh J, Klag MJ, Levey AS, Martin AA, Fink NE, Powe NR. Validation of comorbid conditions on the end-stage renal disease medical evidence report: the CHOICE study. J Am Soc Nephrol. 2000;11(3):520–9.

pubmed: 10703676 doi: 10.1681/ASN.V113520

Meystre SM, Deshmukh VG, Mitchell J. A clinical use case to evaluate the i2b2 Hive: predicting asthma exacerbations. AMIA Ann Symp Proc. 2009;2009:442–6.

Clark C, Good K, Jezierny L, Macpherson M, Wilson B, Chajewska U. Identifying smokers with a medical extraction system. J Am Med Inform Assoc. 2008;15(1):36–9.

pubmed: 17947619 pmcid: 2274874 doi: 10.1197/jamia.M2442

Savova GK, Ogren PV, Duffy PH, Buntrock JD, Chute CG. Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc. 2008;15(1):25–8.

pubmed: 17947622 pmcid: 2274870 doi: 10.1197/jamia.M2437

Mant J, Murphy M, Rose P, Vessey M. The accuracy of general practitioner records of smoking and alcohol use: comparison with patient questionnaires. J Public Health. 2000;22(2):198–201.

doi: 10.1093/pubmed/22.2.198

Yeager DS, Krosnick JA. The validity of self-reported nicotine product use in the 2001–2008 National Health and nutrition examination survey. Med Care. 2010;48:1128–32.

Liu M, Shah A, Jiang M, Peterson NB, Dai Q, Aldrich MC, et al. A study of transportability of an existing smoking status detection module across institutions. AMIA Ann Symp Proc. 2012;2012:577–86.

Figueroa RL, Soto DA, Pino EJ. Identifying and extracting patient smoking status information from clinical narrative texts in Spanish. In: In: 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE; 2014. p. 2710–3.

Teramukai S, Okuda Y, Miyazaki S, Kawamori R, Shirayama M, Teramoto T. Dynamic prediction model and risk assessment chart for cardiovascular disease based on on-treatment blood pressure and baseline risk factors. Hypertens Res. 2016;39(2):113–8.

pubmed: 26606874 doi: 10.1038/hr.2015.120

Damen JA, Hooft L, Schuit E, Debray TP, Collins GS, Tzoulaki I, Lassale CM, Siontis GC, Chiocchia V, Roberts C, Schlüssel MM. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416.

Chang JT, Meza R, Levy DT, Arenberg D, Jeon J. Prediction of COPD risk accounting for time-varying smoking exposures. PLoS One. 2021;16(3):e0248535.

pubmed: 33690706 pmcid: 7946316 doi: 10.1371/journal.pone.0248535

Cadarette SM, Wong L. An introduction to health care administrative data. Can J Hosp Pharm. 2015;68(3):232.

pubmed: 26157185 pmcid: 4485511

Hoeven LR, Bruijne MC, Kemper PF, Koopman MM, Rondeel JM, Leyte A, Koffijberg H, Janssen MP, Roes KC. Validation of multisource electronic health record data: an application to blood transfusion data. BMC Medical Inform Decis Mak. 2017;17(1):1–10.

doi: 10.1186/s12911-017-0504-7

Rahimi AK, Canfell OJ, Chan W, Sly B, Pole JD, Sullivan C, Shrapnel S. Machine learning models for diabetes management in acute care using electronic medical records: a systematic review. Int J Med Inform. 2022;162:104758.

doi: 10.1016/j.ijmedinf.2022.104758

Conderino S, Bendik S, Richards TB, Pulgarin C, Chan PY, Townsend J, Lim S, Roberts TR, Thorpe LE. The use of electronic health records to inform cancer surveillance efforts: a scoping review and test of indicators for public health surveillance of cancer prevention and control. BMC Medical Inform Decis Mak. 2022;22(1):1–3.

doi: 10.1186/s12911-022-01831-8

Cook LA, Sachs J, Weiskopf NG. The quality of social determinants data in the electronic health record: a systematic review. J Am Med Inform Assoc. 2022;29(1):187–96.

doi: 10.1093/jamia/ocab199

Sharabiani MT, Aylin P, Bottle A. Systematic review of comorbidity indices for administrative data. Med Care. 2012;50(12):1109–18.

Vlasschaert ME, Bejaimal SA, Hackam DG, Quinn R, Cuerden MS, Oliver MJ, Iansavichus A, Sultan N, Mills A, Garg AX. Validity of administrative database coding for kidney disease: a systematic review. Am J Kidney Dis. 2011;57(1):29–43.

pubmed: 21184918 doi: 10.1053/j.ajkd.2010.08.031

Lucyk K, Lu M, Sajobi T, Quan H. Administrative health data in Canada: lessons from history. BMC Medical Inform Decis Mak. 2015;15(1):1–6.

doi: 10.1186/s12911-015-0196-9

Birtwhistle R, Keshavjee K, Lambert-Lanning A, Godwin M, Greiver M, Manca D, Lagacé C. Building a pan-Canadian primary care sentinel surveillance network: initial development and moving forward. J Am Board Fam Med. 2009;22(4):412–22.

pubmed: 19587256 doi: 10.3122/jabfm.2009.04.090081

Tu K, Mitiku TF, Ivers NM, Guo H, Lu H, Jaakkimainen L, Kavanagh DG, Lee DS, Tu JV. Evaluation of electronic medical record administrative data linked database (EMRALD). Am J Manag Care. 2014;20(1):e15–21.

pubmed: 24669409

Hess DT. The Danish National Patient Register. Surg Obes Relat Dis. 2016;12(2):304.

pubmed: 26797038 doi: 10.1016/j.soard.2015.11.001

Rusk N, The UK. Biobank. Nat Methods. 2018;15(12):1001.

pubmed: 30504882 doi: 10.1038/s41592-018-0245-2

Samadoulougou S, Idzerda L, Dault R, Lebel A, Cloutier AM, Vanasse A. Validated methods for identifying individuals with obesity in health care administrative databases: a systematic review. Obes Sci Pract. 2020;6(6):677–93.

pubmed: 33354346 pmcid: 7746972 doi: 10.1002/osp4.450

McBrien KA, Souri S, Symonds NE, Rouhi A, Lethebe BC, Williamson TS, Garies S, Birtwhistle R, Quan H, Fabreau GE, Ronksley PE. Identification of validated case definitions for medical conditions used in primary care electronic medical record databases: a systematic review. J Am Med Inform Assoc. 2018;25(11):1567–78.

pubmed: 30137498 pmcid: 7646917 doi: 10.1093/jamia/ocy094

Barber C, Lacaille D, Fortin PR. Systematic review of validation studies of the use of administrative data to identify serious infections. Arthritis Care Res. 2013;65(8):1343–57.

doi: 10.1002/acr.21959

Canan C, Polinski JM, Alexander GC, Kowal MK, Brennan TA, Shrank WH. Automatable algorithms to identify nonmedical opioid use using electronic data: a systematic review. J Am Med Inform Assoc. 2017;24(6):1204–10.

pubmed: 29016967 pmcid: 7651982 doi: 10.1093/jamia/ocx066

Kroeker K, Widdifield J, Muthukumarana S, Jiang D, Lix LM. Model-based methods for case definitions from administrative health data: application to rheumatoid arthritis. BMJ Open. 2017;7(6):e016173.

Van Gaal S, Alimohammadi A, Yu AY, Karim ME, Zhang W, Sutherland JM. Accurate classification of carotid endarterectomy indication using physician claims and hospital discharge data. BMC Health Serv Res. 2022;22(1):1–9.

Zeltzer D, Balicer RD, Shir T, Flaks-Manov N, Einav L, Shadmi E. Prediction accuracy with electronic medical records versus administrative claims. Med Care. 2019;57(7):551–9.

pubmed: 31135691 doi: 10.1097/MLR.0000000000001135

Van den Goorbergh R, van Smeden M, Timmerman D, Van Calster B. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J Am Med Inform Assoc. 2022;29(9):1525–34.

pubmed: 35686364 pmcid: 9382395 doi: 10.1093/jamia/ocac093

Coleman N, Halas G, Peeler W, Casaclang N, Williamson T, Katz A. From patient care to research: a validation study examining the factors contributing to data quality in a primary care electronic medical record database. BMC Fam Pract. 2015;16(1):1–8.

doi: 10.1186/s12875-015-0223-z

O'Donnell S, Palmeter S, Laverty M, Lagacé C. Accuracy of administrative database algorithms for autism spectrum disorder, attention-deficit/hyperactivity disorder and fetal alcohol spectrum disorder case ascertainment: a systematic review. Health Promot Chronic Dis Prev Canada: Res, Policy Pract. 2022;42(9):355.

doi: 10.24095/hpcdp.42.9.01

Chen C, Qin Y, Chen H, Zhu D, Gao F, Zhou X. A meta-analysis of the diagnostic performance of machine learning-based MRI in the prediction of axillary lymph node metastasis in breast cancer patients. Insights Imaging. 2021;12:1–2.

doi: 10.1186/s13244-021-01034-1

Furuya-Kanamori L, Xu C, Lin L, Doan T, Chu H, Thalib L, Doi SA. P value–driven methods were underpowered to detect publication bias: analysis of Cochrane review meta-analyses. J Clin Epidemiol. 2020;118:86–92.

pubmed: 31743750 doi: 10.1016/j.jclinepi.2019.11.011

Al-Azazi S, Singer A, Rabbani R, Lix LM. Combining population-based administrative health records and electronic medical records for disease surveillance. BMC Medical Inform Decis Mak. 2019;19(1):1–2.

doi: 10.1186/s12911-019-0845-5

Hughes DM, El Saeiti R, García-Fiñana M. A comparison of group prediction approaches in longitudinal discriminant analysis. Biom J. 2018;60(2):307–22.

pubmed: 28833412 doi: 10.1002/bimj.201700013

Arribas-Gil A, De la Cruz R, Lebarbier E, Meza C. Classification of longitudinal data through a semiparametric mixed-effects model based on lasso-type estimators. Biometrics. 2015;71(2):333–43.

pubmed: 25639332 doi: 10.1111/biom.12280

Miled ZB, Haas K, Black CM, Khandker RK, Chandrasekaran V, Lipton R, Boustani MA. Predicting dementia with routine care EMR data. Artif Intell Med. 2020;102:101771.

pubmed: 31980108 doi: 10.1016/j.artmed.2019.101771

Jauk S, Kramer D, Großauer B, Rienmüller S, Avian A, Berghold A, Leodolter W, Schulz S. Risk prediction of delirium in hospitalized patients using machine learning: an implementation and prospective evaluation study. J Am Med Inform Assoc. 2020;27(9):1383–92.

pubmed: 32968811 pmcid: 7647341 doi: 10.1093/jamia/ocaa113

James G, Witten D, Hastie T, Tibshirani R. Tree-based methods. In: James G, Witten D, Hastie T, Tibshirani R, editors. An introduction to statistical learning: with applications in R. New York, NY: Springer; 2013. p. 303–35.

doi: 10.1007/978-1-4614-7138-7_8

Thirunavukarasu AJ, Ting DS, Elangovan K, Gutierrez L, Tan TF, Ting DS. Large language models in medicine. Nat Med. 2023;29(8):1930–40.

pubmed: 37460753 doi: 10.1038/s41591-023-02448-8

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Md Ashiqul Haque (MA)

Muditha Lakmali Bodawatte Gedara (MLB)

Nathan Nickel (N)

Maxime Turgeon (M)

Lisa M Lix (LM)

Classifications MeSH