[Construction of text resources for automatic identification of clinical information in unstructured narratives].
Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas.
Journal
Revista medica de Chile
ISSN: 0717-6163
Titre abrégé: Rev Med Chil
Pays: Chile
ID NLM: 0404312
Informations de publication
Date de publication:
Jul 2021
Jul 2021
Historique:
received:
07
04
2020
accepted:
28
04
2021
entrez:
9
11
2021
pubmed:
10
11
2021
medline:
11
11
2021
Statut:
ppublish
Résumé
A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.
Sections du résumé
BACKGROUND
BACKGROUND
A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest.
AIM
OBJECTIVE
To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals.
MATERIAL AND METHODS
METHODS
A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training.
RESULTS
RESULTS
An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement.
CONCLUSIONS
CONCLUSIONS
A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.
Identifiants
pubmed: 34751303
pii: S0034-98872021000701014
doi: 10.4067/s0034-98872021000701014
pii:
doi:
Types de publication
Journal Article
Langues
spa
Sous-ensembles de citation
IM