Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology.

clinical natural language processing electronic health records gold standard natural language processing reference standard sample size

Journal

JMIR medical informatics

ISSN: 2291-9694

Titre abrégé: JMIR Med Inform

Pays: Canada

ID NLM: 101645109

Informations de publication

Date de publication:
23 Jul 2021

Historique:

received: 20 05 2020

accepted: 17 06 2021

revised: 31 07 2020

entrez: 23 7 2021

pubmed: 24 7 2021

medline: 24 7 2021

Statut: epublish

Résumé

Clinical natural language processing (cNLP) systems are of crucial importance due to their increasing capability in extracting clinically important information from free text contained in electronic health records (EHRs). The conversion of a nonstructured representation of a patient's clinical history into a structured format enables medical doctors to generate clinical knowledge at a level that was not possible before. Finally, the interpretation of the insights gained provided by cNLP systems has a great potential in driving decisions about clinical practice. However, carrying out robust evaluations of those cNLP systems is a complex task that is hindered by a lack of standard guidance on how to systematically approach them. Our objective was to offer natural language processing (NLP) experts a methodology for the evaluation of cNLP systems to assist them in carrying out this task. By following the proposed phases, the robustness and representativeness of the performance metrics of their own cNLP systems can be assured. The proposed evaluation methodology comprised five phases: (1) the definition of the target population, (2) the statistical document collection, (3) the design of the annotation guidelines and annotation project, (4) the external annotations, and (5) the cNLP system performance evaluation. We presented the application of all phases to evaluate the performance of a cNLP system called "EHRead Technology" (developed by Savana, an international medical company), applied in a study on patients with asthma. As part of the evaluation methodology, we introduced the Sample Size Calculator for Evaluations (SLiCE), a software tool that calculates the number of documents needed to achieve a statistically useful and resourceful gold standard. The application of the proposed evaluation methodology on a real use-case study of patients with asthma revealed the benefit of the different phases for cNLP system evaluations. By using SLiCE to adjust the number of documents needed, a meaningful and resourceful gold standard was created. In the presented use-case, using as little as 519 EHRs, it was possible to evaluate the performance of the cNLP system and obtain performance metrics for the primary variable within the expected CIs. We showed that our evaluation methodology can offer guidance to NLP experts on how to approach the evaluation of their cNLP systems. By following the five phases, NLP experts can assure the robustness of their evaluation and avoid unnecessary investment of human and financial resources. Besides the theoretical guidance, we offer SLiCE as an easy-to-use, open-source Python library.

Sections du résumé

BACKGROUND BACKGROUND

OBJECTIVE OBJECTIVE

Our objective was to offer natural language processing (NLP) experts a methodology for the evaluation of cNLP systems to assist them in carrying out this task. By following the proposed phases, the robustness and representativeness of the performance metrics of their own cNLP systems can be assured.

METHODS METHODS

The proposed evaluation methodology comprised five phases: (1) the definition of the target population, (2) the statistical document collection, (3) the design of the annotation guidelines and annotation project, (4) the external annotations, and (5) the cNLP system performance evaluation. We presented the application of all phases to evaluate the performance of a cNLP system called "EHRead Technology" (developed by Savana, an international medical company), applied in a study on patients with asthma. As part of the evaluation methodology, we introduced the Sample Size Calculator for Evaluations (SLiCE), a software tool that calculates the number of documents needed to achieve a statistically useful and resourceful gold standard.

RESULTS RESULTS

The application of the proposed evaluation methodology on a real use-case study of patients with asthma revealed the benefit of the different phases for cNLP system evaluations. By using SLiCE to adjust the number of documents needed, a meaningful and resourceful gold standard was created. In the presented use-case, using as little as 519 EHRs, it was possible to evaluate the performance of the cNLP system and obtain performance metrics for the primary variable within the expected CIs.

CONCLUSIONS CONCLUSIONS

We showed that our evaluation methodology can offer guidance to NLP experts on how to approach the evaluation of their cNLP systems. By following the five phases, NLP experts can assure the robustness of their evaluation and avoid unnecessary investment of human and financial resources. Besides the theoretical guidance, we offer SLiCE as an easy-to-use, open-source Python library.

Identifiants

DOI: 10.2196/20492 PMID: 34297002 PMC: PMC8367121

pubmed: 34297002

pii: v9i7e20492

doi: 10.2196/20492

pmc: PMC8367121

doi:

Types de publication

Journal Article

Langues

eng

Pagination

e20492

Informations de copyright

©Lea Canales, Sebastian Menke, Stephanie Marchesseau, Ariel D’Agostino, Carlos del Rio-Bermudez, Miren Taberna, Jorge Tello. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 23.07.2021.

Références

Proc AMIA Symp. 2001;:17-21

pubmed: 11825149

Ticks Tick Borne Dis. 2019 Feb;10(2):241-250

pubmed: 30420251

Yearb Med Inform. 2018 Aug;27(1):184-192

pubmed: 30157522

Proc AMIA Annu Fall Symp. 1997;:595-9

pubmed: 9357695

Methods Inf Med. 1998 Nov;37(4-5):334-44

pubmed: 9865031

J Pharm Policy Pract. 2020 Nov 9;13(1):75

pubmed: 33292570

Stud Health Technol Inform. 2018;247:111-115

pubmed: 29677933

JMIR Med Inform. 2019 Apr 21;7(2):e12109

pubmed: 31066686

J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336

pubmed: 29186491

PLoS One. 2019 Mar 28;14(3):e0214465

pubmed: 30921400

Appl Clin Inform. 2019 Aug;10(4):655-669

pubmed: 31486057

Pac Symp Biocomput. 2015;:282-93

pubmed: 25592589

Stud Health Technol Inform. 2017;245:298-302

pubmed: 29295103

AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:203-212

pubmed: 28815130

Eur Respir J. 2021 Mar 4;57(3):

pubmed: 33154029

Sci Rep. 2017 Apr 07;7:46226

pubmed: 28387314

AMA J Ethics. 2017 Mar 1;19(3):281-288

pubmed: 28323609

Arch Bronconeumol (Engl Ed). 2021 Feb;57(2):94-100

pubmed: 32098727

Sci Rep. 2016 May 17;6:26094

pubmed: 27185194

Drug Saf. 2019 Jan;42(1):123-133

pubmed: 30600484

World J Surg. 2011 Mar;35(3):500-4

pubmed: 21190114

J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470

pubmed: 31794016

Yearb Med Inform. 2015 Aug 13;10(1):183-93

pubmed: 26293867

Drug Saf. 2017 Nov;40(11):1075-1089

pubmed: 28643174

Yearb Med Inform. 2015 Aug 13;10(1):194-8

pubmed: 26293868

J Biomed Inform. 2018 Dec;88:11-19

pubmed: 30368002

J Am Med Inform Assoc. 2018 May 1;25(5):530-537

pubmed: 29361077

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13

pubmed: 20819853

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-60

pubmed: 7719796

JAMA. 2017 Oct 3;318(13):1241-1249

pubmed: 28903154

J Am Med Inform Assoc. 2017 Jan;24(1):198-208

pubmed: 27189013

Nat Rev Genet. 2012 May 02;13(6):395-405

pubmed: 22549152

J Investig Allergol Clin Immunol. 2021 Jul 26;31(4):308-315

pubmed: 31983679

J Clin Med. 2020 Oct 12;9(10):

pubmed: 33053774

J Womens Health (Larchmt). 2021 Mar;30(3):393-404

pubmed: 33416429

J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8

pubmed: 15684123

J Biomed Inform. 2013 Oct;46(5):765-73

pubmed: 23810857

Nat Med. 2021 Apr;27(4):582-584

pubmed: 33820998

BMC Med Inform Decis Mak. 2006 Jul 26;6:30

pubmed: 16872495

Int J Med Inform. 2019 Jul;127:141-146

pubmed: 31128826

JAMA. 2014 Jun 25;311(24):2479-80

pubmed: 24854141

J Biomed Inform. 2018 Jan;77:34-49

pubmed: 29162496

J Med Internet Res. 2020 Oct 28;22(10):e21801

pubmed: 33090964

Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Pagination

Informations de copyright

Références

Auteurs

Lea Canales (L)

Sebastian Menke (S)

Stephanie Marchesseau (S)

Ariel D'Agostino (A)

Carlos Del Rio-Bermudez (C)

Miren Taberna (M)

Jorge Tello (J)

Classifications MeSH