Announcement of the German Medical Text Corpus Project (GeMTeX).
German Medical Informatics Initiative
Natural Language Processing
Text Corpus
Journal
Studies in health technology and informatics
ISSN: 1879-8365
Titre abrégé: Stud Health Technol Inform
Pays: Netherlands
ID NLM: 9214582
Informations de publication
Date de publication:
18 May 2023
18 May 2023
Historique:
medline:
22
5
2023
pubmed:
19
5
2023
entrez:
19
5
2023
Statut:
ppublish
Résumé
The largest publicly funded project to generate a German-language medical text corpus will start in mid-2023. GeMTeX comprises clinical texts from information systems of six university hospitals, which will be made accessible for NLP by annotation of entities and relations, which will be enhanced with additional meta-information. A strong governance provides a stable legal framework for the use of the corpus. State-of-the art NLP methods are used to build, pre-annotate and annotate the corpus and train language models. A community will be built around GeMTeX to ensure its sustainable maintenance, use, and dissemination.
Identifiants
pubmed: 37203512
pii: SHTI230283
doi: 10.3233/SHTI230283
doi:
Types de publication
Journal Article
Langues
eng