Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.

cancer chronic diseases clinical notes deep learning diabetes electronic health records heart disease lung disease machine learning natural language processing stroke

Journal

JMIR medical informatics
ISSN: 2291-9694
Titre abrégé: JMIR Med Inform
Pays: Canada
ID NLM: 101645109

Informations de publication

Date de publication:
27 Apr 2019
Historique:
received: 17 09 2018
accepted: 24 03 2019
revised: 04 03 2019
entrez: 9 5 2019
pubmed: 9 5 2019
medline: 9 5 2019
Statut: epublish

Résumé

Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles. Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.

Sections du résumé

BACKGROUND BACKGROUND
Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset.
OBJECTIVE OBJECTIVE
The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives.
METHODS METHODS
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles.
RESULTS RESULTS
Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes.
CONCLUSIONS CONCLUSIONS
Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.

Identifiants

pubmed: 31066697
pii: v7i2e12239
doi: 10.2196/12239
pmc: PMC6528438
doi:

Types de publication

Journal Article Review

Langues

eng

Pagination

e12239

Informations de copyright

©Seyedmostafa Sheikhalishahi, Riccardo Miotto, Joel T Dudley, Alberto Lavelli, Fabio Rinaldi, Venet Osmani. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 27.04.2019.

Références

JMIR Med Inform. 2016 Jun 01;4(2):e19
pubmed: 27251559
Clin Med Res. 2012 Aug;10(3):106-21
pubmed: 22634542
World J Urol. 2014 Feb;32(1):99-103
pubmed: 23417341
J Am Med Inform Assoc. 2017 May 01;24(3):607-613
pubmed: 28339516
J Am Med Inform Assoc. 2017 Jan;24(1):162-171
pubmed: 27497800
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):824-32
pubmed: 24431333
J Med Syst. 2016 Aug;40(8):191
pubmed: 27402260
BMC Med Inform Decis Mak. 2017 Feb 28;17(1):24
pubmed: 28241760
PLoS Med. 2009 Jul 21;6(7):e1000097
pubmed: 19621072
J Biomed Inform. 2017 Aug;72:23-32
pubmed: 28663072
J Biomed Inform. 2012 Jun;45(3):471-81
pubmed: 22289420
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):887-90
pubmed: 23543111
Artif Intell Med. 2015 May;64(1):41-50
pubmed: 25990897
Int J Med Inform. 2015 Dec;84(12):1039-47
pubmed: 26254876
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):221-30
pubmed: 24201027
Telemed J E Health. 2013 Sep;19(9):704-10
pubmed: 23869395
Appl Clin Inform. 2014 Apr 09;5(2):349-67
pubmed: 25024754
J R Soc Interface. 2018 Apr;15(141):
pubmed: 29618526
J Biomed Inform. 2011 Oct;44(5):728-37
pubmed: 21459155
J Am Med Inform Assoc. 2015 Apr;22(e1):e81-92
pubmed: 25352567
Sci Rep. 2017 Apr 07;7:46226
pubmed: 28387314
Int J Med Inform. 2018 Mar;111:140-148
pubmed: 29425625
Pharmacoepidemiol Drug Saf. 2010 Aug;19(8):843-7
pubmed: 20602346
BMC Med Inform Decis Mak. 2015 Apr 14;15:28
pubmed: 25881112
Biomed Res Int. 2015;2015:636371
pubmed: 26380290
J Biomed Inform. 2017 May;69:251-258
pubmed: 28438706
Sci Rep. 2016 May 17;6:26094
pubmed: 27185194
J Biomed Inform. 2015 Dec;58 Suppl:S111-S119
pubmed: 26122527
Inform Prim Care. 2010;18(2):125-33
pubmed: 21078235
BMC Cardiovasc Disord. 2017 Jun 12;17(1):151
pubmed: 28606104
Sci Rep. 2017 Jul 31;7(1):6918
pubmed: 28761061
J Clin Gastroenterol. 2016 Nov/Dec;50(10):889-894
pubmed: 27348317
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):859-66
pubmed: 22437073
AMIA Annu Symp Proc. 2010 Nov 13;2010:857-61
pubmed: 21347100
Qual Life Res. 2008 Dec;17(10):1277-84
pubmed: 18972222
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):568-74
pubmed: 20819866
PLoS Comput Biol. 2011 Aug;7(8):e1002141
pubmed: 21901084
Int J Methods Psychiatr Res. 2016 Jun;25(2):86-100
pubmed: 26184780
J Am Med Inform Assoc. 2015 Apr;22(e1):e93-103
pubmed: 25324557
J Biomed Inform. 2017 May;69:177-187
pubmed: 28428140
PLoS One. 2015 Aug 24;10(8):e0136651
pubmed: 26301417
J Biomed Inform. 2016 Dec;64:179-191
pubmed: 27729234
PLoS One. 2013 May 23;8(5):e63499
pubmed: 23717437
West J Nurs Res. 2017 Jan;39(1):147-165
pubmed: 27628125
J Biomed Inform. 2009 Oct;42(5):923-36
pubmed: 19646551
BMC Med Inform Decis Mak. 2017 Dec 19;17(1):175
pubmed: 29258594
J Biomed Inform. 2015 Dec;58 Suppl:S158-S163
pubmed: 26362344
HPB (Oxford). 2010 Dec;12(10):688-95
pubmed: 21083794
Health Informatics J. 2014 Dec;20(4):288-305
pubmed: 25155030
Am J Epidemiol. 2014 Mar 15;179(6):749-58
pubmed: 24488511
J Am Med Inform Assoc. 2016 Apr;23(e1):e20-7
pubmed: 26338219
J Vasc Surg. 2017 Jun;65(6):1753-1761
pubmed: 28189359
J Biomed Inform. 2015 Dec;58 Suppl:S183-S188
pubmed: 26133479
Psychosomatics. 2011 Jul-Aug;52(4):319-27
pubmed: 21777714
J Am Med Inform Assoc. 2017 Jan;24(1):198-208
pubmed: 27189013
J Am Med Inform Assoc. 2017 Mar 01;24(2):339-344
pubmed: 27375290
Int J Med Inform. 2018 Mar;111:83-89
pubmed: 29425639
J Biomed Inform. 2017 Sep;73:14-29
pubmed: 28729030
Methods Inf Med. 2012;51(3):242-51
pubmed: 21792466
Evid Based Ment Health. 2017 Aug;20(3):83-87
pubmed: 28739578
AMIA Annu Symp Proc. 2008 Nov 06;:545-9
pubmed: 18998862
J Biomed Inform. 2015 Dec;58 Suppl:S171-S182
pubmed: 26375492
Int J Med Inform. 2014 Dec;83(12):983-92
pubmed: 23317809
J Digit Imaging. 2012 Feb;25(1):43-9
pubmed: 22042494
Brief Bioinform. 2018 Nov 27;19(6):1236-1246
pubmed: 28481991
Stroke. 2015 May;46(5):e121-2
pubmed: 25873596
Radiology. 2016 May;279(2):329-43
pubmed: 27089187
Int J Med Inform. 2017 Sep;105:110-120
pubmed: 28750904
JAMA. 1987 Jul 3;258(1):67-74
pubmed: 3295316
J Pathol Inform. 2016 Nov 29;7:46
pubmed: 27994938
Nat Rev Genet. 2012 May 02;13(6):395-405
pubmed: 22549152
Telemed J E Health. 2017 May;23(5):404-420
pubmed: 27782787
J Biomed Inform. 2015 Dec;58 Suppl:S164-S170
pubmed: 26279500
Int J Med Inform. 2014 Sep;83(9):605-23
pubmed: 25008281
J Med Internet Res. 2018 Jan 30;20(1):e22
pubmed: 29382633
J Clin Endocrinol Metab. 2009 Jun;94(6):1853-78
pubmed: 19494161
PLoS One. 2014 Sep 24;9(9):e107797
pubmed: 25250675
J Biomed Inform. 2017 May;69:160-176
pubmed: 28410983
Med Care. 2017 Oct;55(10):e73-e80
pubmed: 25924079
Med Decis Making. 2012 Jan-Feb;32(1):188-97
pubmed: 21393557
Eur J Ophthalmol. 2016 Jun 10;26(4):328-37
pubmed: 26692059
J Am Med Inform Assoc. 2016 Nov;23(6):1077-1084
pubmed: 27026618
JMIR Med Inform. 2018 Jan 15;6(1):e5
pubmed: 29335238
JMIR Med Inform. 2016 Nov 11;4(4):e37
pubmed: 27836816
PLoS One. 2016 Apr 28;11(4):e0153749
pubmed: 27124000
J Biomed Inform. 2015 Dec;58 Suppl:S203-S210
pubmed: 26319542
Artif Intell Med. 2016 Jun;70:77-83
pubmed: 27431038
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):585-9
pubmed: 19390100
J Card Fail. 2014 Jul;20(7):459-64
pubmed: 24709663
PLoS One. 2016 Sep 19;11(9):e0162287
pubmed: 27643689
J Oncol Pract. 2016 Feb;12(2):157-8; e169-7
pubmed: 26306621
J Am Med Inform Assoc. 2012 Jun;19(e1):e162-9
pubmed: 22374935
Clin Ther. 2015 Sep;37(9):2048-2058.e2
pubmed: 26233471
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):596-600
pubmed: 19390098
Ann Transl Med. 2016 Jul;4(13):256
pubmed: 27500157
J Am Med Inform Assoc. 2008 Mar-Apr;15(2):198-202
pubmed: 18096902
Diabetes Res Clin Pract. 2016 Nov;121:192-203
pubmed: 27744128
BMC Med Inform Decis Mak. 2017 Aug 22;17(1):126
pubmed: 28830409
J Am Med Inform Assoc. 2010 Jul-Aug;17(4):383-8
pubmed: 20595304
Arthritis Care Res (Hoboken). 2017 Sep;69(9):1414-1420
pubmed: 27813310
J Biomed Inform. 2014 Apr;48:130-6
pubmed: 24486562
J Biomed Inform. 2017 Aug;72:77-84
pubmed: 28624641
Am J Manag Care. 2007 Jun;13(6 Part 1):281-8
pubmed: 17567225
J Biomed Inform. 2018 Jan;77:34-49
pubmed: 29162496
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):898-905
pubmed: 23144336
IEEE J Biomed Health Inform. 2016 Sep;20(5):1404-15
pubmed: 25312965
J Biomed Inform. 2015 Dec;58 Suppl:S150-S157
pubmed: 26432355
J Biomed Inform. 2017 Mar;67:42-48
pubmed: 28163196
PLoS One. 2016 May 26;11(5):e0154952
pubmed: 27227451
J Am Med Inform Assoc. 2016 Sep;23(5):1007-15
pubmed: 26911811
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):801-7
pubmed: 24384230

Auteurs

Seyedmostafa Sheikhalishahi (S)

eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.
Department of Information Engineering and Computer Science, University of Trento, Trento, Italy.

Riccardo Miotto (R)

Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.

Joel T Dudley (JT)

Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.

Alberto Lavelli (A)

NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.

Fabio Rinaldi (F)

Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland.

Venet Osmani (V)

eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.

Classifications MeSH