Deep Learning Approaches Outperform Conventional Strategies in De-Identification of German Medical Reports.
De-Identification
Deep Learning
ELMo
German Medical Admission Notes
LSTM
Machine Learning
Personal Health Information
Journal
Studies in health technology and informatics
ISSN: 1879-8365
Titre abrégé: Stud Health Technol Inform
Pays: Netherlands
ID NLM: 9214582
Informations de publication
Date de publication:
03 Sep 2019
03 Sep 2019
Historique:
entrez:
5
9
2019
pubmed:
5
9
2019
medline:
14
9
2019
Statut:
ppublish
Résumé
One of the major obstacles for research on German medical reports is the lack of de-identified medical corpora. Previous de-identification tasks focused on non-German medical texts, which raised the demand for an in-depth evaluation of de-identification methods on German medical texts. Because of remarkable advancements in natural language processing using supervised machine learning methods on limited training data, we evaluated them for the first time on German medical reports using our annotated data set consisting of 113 medical reports from the cardiology domain. We applied state-of-the-art deep learning methods using pre-trained models as input to a bidirectional LSTM network and well-established conditional random fields for de-identification of German medical reports. We performed an extensive evaluation for de-identification and multiclass named entity recognition. Using rule based and out of domain machine learning methods as a baseline, the conditional random field improved F2-score from 70 to 93% for de-identification, the neural approach reached 96% in F2-score while keeping balanced precision and recall rates. These results show, that state-of-the-art machine learning methods can play a crucial role in de-identification of German medical reports.
Identifiants
pubmed: 31483261
pii: SHTI190813
doi: 10.3233/SHTI190813
doi:
Types de publication
Journal Article
Langues
eng