[Construction of text resources for automatic identification of clinical information in unstructured narratives].

Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas.

Journal

Revista medica de Chile
ISSN: 0717-6163
Titre abrégé: Rev Med Chil
Pays: Chile
ID NLM: 0404312

Informations de publication

Date de publication:
Jul 2021
Historique:
received: 07 04 2020
accepted: 28 04 2021
entrez: 9 11 2021
pubmed: 10 11 2021
medline: 11 11 2021
Statut: ppublish

Résumé

A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.

Sections du résumé

BACKGROUND BACKGROUND
A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest.
AIM OBJECTIVE
To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals.
MATERIAL AND METHODS METHODS
A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training.
RESULTS RESULTS
An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement.
CONCLUSIONS CONCLUSIONS
A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.

Identifiants

pubmed: 34751303
pii: S0034-98872021000701014
doi: 10.4067/s0034-98872021000701014
pii:
doi:

Types de publication

Journal Article

Langues

spa

Sous-ensembles de citation

IM

Pagination

1014-1022

Auteurs

Pablo Báez (P)

Centro de Informática Médica y Telemedicina, Facultad de Medicina, Universidad de Chile, Santiago, Chile.

Fabián Villena (F)

Centro de Informática Médica y Telemedicina, Facultad de Medicina, Universidad de Chile, Santiago, Chile.

Karen Zúñiga (K)

Escuela de Medicina, Universidad de Chile, Santiago, Chile.

Natalia Jones (N)

Escuela de Medicina, Universidad de Chile, Santiago, Chile.

Gustavo Fernández (G)

Escuela de Medicina, Universidad de Chile, Santiago, Chile.

Manuel Durán (M)

Centro de Informática Médica y Telemedicina, Facultad de Medicina, Universidad de Chile, Santiago, Chile.

Jocelyn Dunstan (J)

Centro de Informática Médica y Telemedicina, Facultad de Medicina, Universidad de Chile, Santiago, Chile.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH