Health-Related Content in Transformer-Based Deep Neural Network Language Models: Exploring Cross-Linguistic Syntactic Bias.
COVID-19
Corpora
Knowledge Reproduction
Language Models
Natural Language Processing
Journal
Studies in health technology and informatics
ISSN: 1879-8365
Titre abrégé: Stud Health Technol Inform
Pays: Netherlands
ID NLM: 9214582
Informations de publication
Date de publication:
29 Jun 2022
29 Jun 2022
Historique:
entrez:
1
7
2022
pubmed:
2
7
2022
medline:
6
7
2022
Statut:
ppublish
Résumé
This paper explores a methodology for bias quantification in transformer-based deep neural network language models for Chinese, English, and French. When queried with health-related mythbusters on COVID-19, we observe a bias that is not of a semantic/encyclopaedical knowledge nature, but rather a syntactic one, as predicted by theoretical insights of structural complexity. Our results highlight the need for the creation of health-communication corpora as training sets for deep learning.
Identifiants
pubmed: 35773848
pii: SHTI220702
doi: 10.3233/SHTI220702
doi:
Types de publication
Journal Article
Langues
eng