Annotating German Clinical Documents for De-Identification.
Confidentiality
Data Anonymization
Natural Language Processing
Journal
Studies in health technology and informatics
ISSN: 1879-8365
Titre abrégé: Stud Health Technol Inform
Pays: Netherlands
ID NLM: 9214582
Informations de publication
Date de publication:
21 Aug 2019
21 Aug 2019
Historique:
entrez:
24
8
2019
pubmed:
24
8
2019
medline:
11
9
2019
Statut:
ppublish
Résumé
We devised annotation guidelines for the de-identification of German clinical documents and assembled a corpus of 1,106 discharge summaries and transfer letters with 44K annotated protected health information (PHI) items. After three iteration rounds, our annotation team finally reached an inter-annotator agreement of 0.96 on the instance level and 0.97 on the token level of annotation (averaged pair-wise F1 score). To establish a baseline for automatic de-identification on our corpus, we trained a recurrent neural network (RNN) and achieved F1 scores greater than 0.9 on most major PHI categories.
Identifiants
pubmed: 31437914
pii: SHTI190212
doi: 10.3233/SHTI190212
doi:
Types de publication
Journal Article
Langues
eng