[Construction of text resources for automatic identification of clinical information in unstructured narratives].

Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas.

Chile Electronic Data Processing Humans

Journal

Revista medica de Chile

ISSN: 0717-6163

Titre abrégé: Rev Med Chil

Pays: Chile

ID NLM: 0404312

Informations de publication

Date de publication:
Jul 2021

Historique:

received: 07 04 2020

accepted: 28 04 2021

entrez: 9 11 2021

pubmed: 10 11 2021

medline: 11 11 2021

Statut: ppublish

Résumé

A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.

Sections du résumé

BACKGROUND BACKGROUND

AIM OBJECTIVE

To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals.

MATERIAL AND METHODS METHODS

A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training.

RESULTS RESULTS

An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement.

CONCLUSIONS CONCLUSIONS

A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.

Identifiants

DOI: 10.4067/s0034-98872021000701014 PMID: 34751303

pubmed: 34751303

pii: S0034-98872021000701014

doi: 10.4067/s0034-98872021000701014

pii:

doi:

Types de publication

Journal Article

Langues

spa

Sous-ensembles de citation

Pagination

1014-1022

[Construction of text resources for automatic identification of clinical information in unstructured narratives].

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Auteurs

Pablo Báez (P)

Fabián Villena (F)

Karen Zúñiga (K)

Natalia Jones (N)

Gustavo Fernández (G)

Manuel Durán (M)

Jocelyn Dunstan (J)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH