Enhancing an enterprise data warehouse for research with data extracted using natural language processing.
ETL
Natural language processing
data service
electronic health records
enterprise data warehouse for research
rule-based
smoking behavior
Journal
Journal of clinical and translational science
ISSN: 2059-8661
Titre abrégé: J Clin Transl Sci
Pays: England
ID NLM: 101689953
Informations de publication
Date de publication:
2023
2023
Historique:
received:
11
03
2023
revised:
14
05
2023
accepted:
31
05
2023
medline:
17
7
2023
pubmed:
17
7
2023
entrez:
17
7
2023
Statut:
epublish
Résumé
This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without need for NLP expertise. The study also quantifies the additional value that information extracted from clinical narratives brings to EDW4R. Clinical notes written during one month at an academic health center were used to evaluate the performance of an existing NLP model and to quantify its value added to the structured data. Manual review was utilized for performance analysis. The architecture for enhancing the EDW4R is described in detail to enable reproducibility. Two weeks were needed to enhance EDW4R with data from 250 million clinical notes. NLP generated 16 and 39% increase in data availability for two variables. Our architecture is highly generalizable to a new NLP model. The positive predictive value obtained by an independent team showed only slightly lower NLP performance than the values reported by the NLP developers. The NLP showed significant value added to data already available in structured format. Given the value added by data extracted using NLP, it is important to enhance EDW4R with these data to enable research teams without NLP expertise to benefit from value added by NLP models.
Identifiants
pubmed: 37456264
doi: 10.1017/cts.2023.575
pii: S2059866123005757
pmc: PMC10346024
doi:
Types de publication
Journal Article
Langues
eng
Pagination
e149Informations de copyright
© The Author(s) 2023.
Déclaration de conflit d'intérêts
The authors have no conflicts of interest to declare.
Références
Heart. 2022 May 25;108(12):909-916
pubmed: 34711662
JAMA Intern Med. 2015 Feb;175(2):218-26
pubmed: 25506771
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13
pubmed: 20819853
J Biomed Inform. 2018 Jan;77:34-49
pubmed: 29162496
J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336
pubmed: 29186491
AMIA Annu Symp Proc. 2006;:126-30
pubmed: 17238316
Lang Linguist Compass. 2021 Aug;15(8):e12432
pubmed: 35864931
J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379
pubmed: 30726935
Trials. 2017 Nov 28;18(1):568
pubmed: 29179734
Semin Oncol. 2022 Jun 26;:
pubmed: 35831214
EGEMS (Wash DC). 2018 Jun 01;6(1):13
pubmed: 30094285
Neurology. 2021 Sep 28;97(13):e1313-e1321
pubmed: 34376505
BMJ. 2015 Apr 24;350:h1885
pubmed: 25911572
J Nurs Care Qual. 2022 Jul-Sep 01;37(3):263-268
pubmed: 35380553
Contemp Clin Trials. 2021 Dec;111:106586
pubmed: 34606988
JMIR Med Inform. 2020 Mar 31;8(3):e17984
pubmed: 32229465
BMC Med Res Methodol. 2010 Aug 02;10:70
pubmed: 20678228
J Am Med Inform Assoc. 2022 Apr 13;29(5):779-788
pubmed: 35167675
JMIR Med Inform. 2019 Apr 27;7(2):e12239
pubmed: 31066697
J Biomed Inform. 2017 Sep;73:14-29
pubmed: 28729030
BMC Res Notes. 2018 Jan 10;11(1):14
pubmed: 29321038
J Am Med Inform Assoc. 2020 Jul 1;27(9):1352-1358
pubmed: 32679585
J Am Med Inform Assoc. 2017 Sep 01;24(5):986-991
pubmed: 28419261
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36
pubmed: 20442139
IEEE Int Conf Healthc Inform. 2020 Nov-Dec;2020:
pubmed: 33786419
Lancet. 2023 Feb 4;401(10374):390-408
pubmed: 36563698
J Am Med Inform Assoc. 2015 Nov;22(6):1196-204
pubmed: 26232442
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):78
pubmed: 30943974
J Gen Intern Med. 2019 Nov;34(11):2355-2367
pubmed: 31183688
J Med Internet Res. 2021 May 4;23(5):e15708
pubmed: 33944788
Am J Respir Crit Care Med. 2018 Jan 15;197(2):172-182
pubmed: 28977754
JAMA Oncol. 2016 Jun 1;2(6):797-804
pubmed: 27124593
J Am Med Inform Assoc. 2019 Nov 1;26(11):1364-1369
pubmed: 31145455