GRASCCO - The First Publicly Shareable, Multiply-Alienated German Clinical Text Corpus.
Case Reports
Clinical NLP
German Clinical Document Corpus
Journal
Studies in health technology and informatics
ISSN: 1879-8365
Titre abrégé: Stud Health Technol Inform
Pays: Netherlands
ID NLM: 9214582
Informations de publication
Date de publication:
17 Aug 2022
17 Aug 2022
Historique:
entrez:
8
9
2022
pubmed:
9
9
2022
medline:
11
9
2022
Statut:
ppublish
Résumé
We describe the creation of GRASCCO, a novel German-language corpus composed of some 60 clinical documents with more than.43,000 tokens. GRASCCO is a synthetic corpus resulting from a series of alienation steps to obfuscate privacy-sensitive information contained in real clinical documents, the true origin of all GRASCCO texts. Therefore, it is publicly shareable without any legal restrictions We also explore whether this corpus still represents common clinical language use by comparison with a real (non-shareable) clinical corpus we developed as a contribution to the Medical Informatics Initiative in Germany (MII) within the SMITH consortium. We find evidence that such a claim can indeed be made.
Identifiants
pubmed: 36073490
pii: SHTI220805
doi: 10.3233/SHTI220805
doi:
Types de publication
Journal Article
Langues
eng