GRASCCO - The First Publicly Shareable, Multiply-Alienated German Clinical Text Corpus.

Case Reports Clinical NLP German Clinical Document Corpus

Journal

Studies in health technology and informatics
ISSN: 1879-8365
Titre abrégé: Stud Health Technol Inform
Pays: Netherlands
ID NLM: 9214582

Informations de publication

Date de publication:
17 Aug 2022
Historique:
entrez: 8 9 2022
pubmed: 9 9 2022
medline: 11 9 2022
Statut: ppublish

Résumé

We describe the creation of GRASCCO, a novel German-language corpus composed of some 60 clinical documents with more than.43,000 tokens. GRASCCO is a synthetic corpus resulting from a series of alienation steps to obfuscate privacy-sensitive information contained in real clinical documents, the true origin of all GRASCCO texts. Therefore, it is publicly shareable without any legal restrictions We also explore whether this corpus still represents common clinical language use by comparison with a real (non-shareable) clinical corpus we developed as a contribution to the Medical Informatics Initiative in Germany (MII) within the SMITH consortium. We find evidence that such a claim can indeed be made.

Identifiants

pubmed: 36073490
pii: SHTI220805
doi: 10.3233/SHTI220805
doi:

Types de publication

Journal Article

Langues

eng

Pagination

66-72

Auteurs

Luise Modersohn (L)

JULIE Lab, Friedrich Schiller University Jena, Germany.
Intelligence and Informatics in Medicine, Medical Center rechts der Isar, Technical University Munich, Germany.
SMITH Consortium of the German Medical Informatics Initiative.
DIFUTURE Consortium of the German Medical Informatics Initiative.

Stefan Schulz (S)

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

Christina Lohr (C)

JULIE Lab, Friedrich Schiller University Jena, Germany.
SMITH Consortium of the German Medical Informatics Initiative.

Udo Hahn (U)

JULIE Lab, Friedrich Schiller University Jena, Germany.
SMITH Consortium of the German Medical Informatics Initiative.

Articles similaires

Classifications MeSH