Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies.

data curation electronic health records medical informatics randomized controlled trials real-world evidence reproducibility

Journal

Journal of medical Internet research
ISSN: 1438-8871
Titre abrégé: J Med Internet Res
Pays: Canada
ID NLM: 100959882

Informations de publication

Date de publication:
25 05 2023
Historique:
received: 11 01 2023
accepted: 05 04 2023
revised: 31 03 2023
medline: 29 5 2023
pubmed: 25 5 2023
entrez: 25 5 2023
Statut: epublish

Résumé

Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR.

Identifiants

pubmed: 37227772
pii: v25i1e45662
doi: 10.2196/45662
pmc: PMC10251230
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e45662

Subventions

Organisme : NIAMS NIH HHS
ID : P30 AR072577
Pays : United States
Organisme : NINDS NIH HHS
ID : K99 NS114850
Pays : United States

Informations de copyright

©Jue Hou, Rachel Zhao, Jessica Gronsbell, Yucong Lin, Clara-Lea Bonzel, Qingyi Zeng, Sinian Zhang, Brett K Beaulieu-Jones, Griffin M Weber, Thomas Jemielita, Shuyan Sabrina Wan, Chuan Hong, Tianrun Cai, Jun Wen, Vidul Ayakulangara Panickan, Kai-Li Liaw, Katherine Liao, Tianxi Cai. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.05.2023.

Références

Clin Pharmacol Ther. 2020 Apr;107(4):817-826
pubmed: 31541454
JAMA Netw Open. 2019 Oct 2;2(10):e1912869
pubmed: 31596493
Contemp Clin Trials Commun. 2018 Aug 07;11:156-164
pubmed: 30112460
AMIA Annu Symp Proc. 2018 Dec 05;2018:564-573
pubmed: 30815097
JMIR Med Inform. 2015 Sep 18;3(3):e30
pubmed: 26385598
AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:41-50
pubmed: 27570647
Adv Ther. 2019 Aug;36(8):2122-2136
pubmed: 31140124
Stat Med. 2011 Oct 30;30(24):2867-80
pubmed: 21815180
Circulation. 2021 Mar 9;143(10):1002-1013
pubmed: 33327727
BMC Med Inform Decis Mak. 2019 Nov 15;19(1):226
pubmed: 31730484
J Am Med Inform Assoc. 2020 Jan 1;27(1):119-126
pubmed: 31722396
J Am Med Inform Assoc. 2017 Apr 01;24(e1):e143-e149
pubmed: 27632993
J Am Med Inform Assoc. 2021 Jan 15;28(1):144-154
pubmed: 33164065
J Am Med Inform Assoc. 2015 Sep;22(5):993-1000
pubmed: 25929596
JCO Clin Cancer Inform. 2018 Dec;2:1-10
pubmed: 30652573
NPJ Digit Med. 2021 Oct 27;4(1):151
pubmed: 34707226
Eur J Cancer. 2009 Jan;45(2):228-47
pubmed: 19097774
Lifetime Data Anal. 2022 Jul;28(3):428-491
pubmed: 35753014
J Am Med Inform Assoc. 2013 Jun;20(e1):e147-54
pubmed: 23531748
Med Care. 2017 Dec;55(12):e88-e98
pubmed: 29135771
J Geriatr Cardiol. 2019 Jan;16(1):42-48
pubmed: 30800150
Nat Protoc. 2019 Dec;14(12):3426-3444
pubmed: 31748751
JAMA. 2010 Oct 20;304(15):1709-10
pubmed: 20959581
Lancet. 2005 Jan 1-7;365(9453):82-93
pubmed: 15639683
JAMA Netw Open. 2021 Jul 1;4(7):e2114723
pubmed: 34232304
Biometrics. 2005 Dec;61(4):962-73
pubmed: 16401269
J Clin Oncol. 2008 Jul 20;26(21):3523-9
pubmed: 18640933
Diabetes Obes Metab. 2020 Apr;22 Suppl 3:45-59
pubmed: 32250527
J Am Med Inform Assoc. 2019 Nov 1;26(11):1189-1194
pubmed: 31414700
J Am Med Inform Assoc. 2019 Nov 1;26(11):1255-1262
pubmed: 31613361
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):221-30
pubmed: 24201027
J Am Med Inform Assoc. 2015 May;22(3):553-64
pubmed: 25670757
N Engl J Med. 2004 May 13;350(20):2050-9
pubmed: 15141043
Bioinformatics. 2010 May 1;26(9):1205-10
pubmed: 20335276
JAMA Netw Open. 2021 Nov 1;4(11):e2134627
pubmed: 34783826
World J Gastrointest Oncol. 2022 Jan 15;14(1):124-152
pubmed: 35116107
J Natl Cancer Inst Monogr. 1995;(19):51-6
pubmed: 7577206
Biometrics. 2021 Jun;77(2):413-423
pubmed: 32413171
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
pubmed: 14681409
J Am Med Inform Assoc. 2013 Dec;20(e2):e253-9
pubmed: 23851443
Sci Rep. 2022 Oct 22;12(1):17737
pubmed: 36273240
Annu Int Conf IEEE Eng Med Biol Soc. 2012;2012:2845-8
pubmed: 23366517
Nat Rev Clin Oncol. 2019 May;16(5):312-325
pubmed: 30700859
Med Image Anal. 2019 Jul;55:88-102
pubmed: 31035060
J Am Med Inform Assoc. 2020 Aug 1;27(8):1235-1243
pubmed: 32548637
Bull Med Libr Assoc. 1993 Apr;81(2):170-7
pubmed: 8472002
JAMA Netw Open. 2022 Jun 1;5(6):e2218371
pubmed: 35737384
Neuroimage. 2017 Feb 1;146:1038-1049
pubmed: 27693612
J Am Med Inform Assoc. 2012 Jan-Feb;19(1):54-60
pubmed: 22037893
Ann Transl Med. 2020 Jun;8(11):713
pubmed: 32617333
Annu Rev Biomed Data Sci. 2018 Jul;1:53-68
pubmed: 31218278
J Am Med Inform Assoc. 2016 Nov;23(6):1046-1052
pubmed: 27026615
J Digit Imaging. 2019 Aug;32(4):582-596
pubmed: 31144149
Pac Symp Biocomput. 2021;26:38-49
pubmed: 33691002
Nucleic Acids Res. 2021 Jan 8;49(D1):D1207-D1217
pubmed: 33264411
AMIA Annu Symp Proc. 2006;:1044
pubmed: 17238663
Am J Med Qual. 2006 Jul-Aug;21(4):269-75
pubmed: 16849784
Stud Health Technol Inform. 2014;205:584-8
pubmed: 25160253
Pac Symp Biocomput. 2020;25:295-306
pubmed: 31797605
J Am Med Inform Assoc. 2016 Jul;23(4):731-40
pubmed: 27107443
BMJ. 2021 Jan 12;372:m4856
pubmed: 33436424
Br J Cancer. 2013 Apr 16;108(7):1402-7
pubmed: 23511558
Comput Methods Programs Biomed. 2018 May;158:71-91
pubmed: 29544791
Clin Pharmacol Ther. 2020 Apr;107(4):843-852
pubmed: 31562770
J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781
pubmed: 33313899
IEEE Trans Med Imaging. 2019 Feb 04;:
pubmed: 30716034
Clin Chem. 2003 Apr;49(4):624-33
pubmed: 12651816
Ther Innov Regul Sci. 2022 Jan;56(1):65-75
pubmed: 34327673
J Am Med Inform Assoc. 2012 Jun;19(e1):e162-9
pubmed: 22374935
J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173
pubmed: 27174893
J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381
pubmed: 36413056
J Am Med Inform Assoc. 2020 Nov 1;27(11):1752-1763
pubmed: 32968785
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:48-57
pubmed: 28815104
Eur Heart J Cardiovasc Imaging. 2020 Apr 1;21(4):437-445
pubmed: 31230076
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60
pubmed: 29126253

Auteurs

Jue Hou (J)

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States.

Rachel Zhao (R)

Department of Medicine, University of British Columbia, Vancouver, BC, Canada.

Jessica Gronsbell (J)

Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada.

Yucong Lin (Y)

Institute of Engineering Medicine, Beijing Institute of Technology, Beijing, China.

Clara-Lea Bonzel (CL)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

Qingyi Zeng (Q)

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States.

Sinian Zhang (S)

School of Statistics, Renmin University of China, Bejing, China.

Brett K Beaulieu-Jones (BK)

Department of Medicine, University of Chicago, Chicago, IL, United States.

Griffin M Weber (GM)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

Thomas Jemielita (T)

Merck & Co, Inc, Rahway, NJ, United States.

Shuyan Sabrina Wan (SS)

Merck & Co, Inc, Rahway, NJ, United States.

Chuan Hong (C)

Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, United States.

Tianrun Cai (T)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

Jun Wen (J)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

Vidul Ayakulangara Panickan (V)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

Kai-Li Liaw (KL)

Merck & Co, Inc, Rahway, NJ, United States.

Katherine Liao (K)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.
Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States.

Tianxi Cai (T)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.
Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH