De-identifying Norwegian Clinical Text using Resources from Swedish and Danish.


Journal

AMIA ... Annual Symposium proceedings. AMIA Symposium
ISSN: 1942-597X
Titre abrégé: AMIA Annu Symp Proc
Pays: United States
ID NLM: 101209213

Informations de publication

Date de publication:
2023
Historique:
medline: 15 1 2024
pubmed: 15 1 2024
entrez: 15 1 2024
Statut: epublish

Résumé

The lack of relevant annotated datasets represents one key limitation in the application of Natural Language Processing techniques in a broad number of tasks, among them Protected Health Information (PHI) identification in Norwegian clinical text. In this work, the possibility of exploiting resources from Swedish, a very closely related language, to Norwegian is explored. The Swedish dataset is annotated with PHI information. Different processing and text augmentation techniques are evaluated, along with their impact in the final performance of the model. The augmentation techniques, such as injection and generation of both Norwegian and Scandinavian Named Entities into the Swedish training corpus, showed to increase the performance in the de-identification task for both Danish and Norwegian text. This trend was also confirmed by the evaluation of model performance on a sample Norwegian gastro surgical clinical text.

Identifiants

pubmed: 38222432
pii: 907
pmc: PMC10785939

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

456-464

Informations de copyright

©2023 AMIA - All rights reserved.

Auteurs

Anastasios Lamproudis (A)

Norwegian Centre for E-health Research, Tromsø, Norway.

Sara Mora (S)

Department of Informatics, Bioengineering, Robotics and System engineering (DIBRIS), University of Genoa, Genoa, Italy.

Therese Olsen Svenning (TO)

Norwegian Centre for E-health Research, Tromsø, Norway.

Torbjørn Torsvik (T)

Norwegian Centre for E-health Research, Tromsø, Norway.

Taridzo Chomutare (T)

Norwegian Centre for E-health Research, Tromsø, Norway.
Department of Computer Science, UiT - The Arctic University of Norway, Tromsø, Norway.

Phuong Dinh Ngo (PD)

Norwegian Centre for E-health Research, Tromsø, Norway.
Department of Physics and Technology, UiT - The Arctic University of Norway, Tromsø, Norway.

Hercules Dalianis (H)

Norwegian Centre for E-health Research, Tromsø, Norway.
Department of Computer and Systems Science (DSV), Stockholm University, Kista, Sweden.

Classifications MeSH