Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax, and Context.
LLM
context
encoding models
fMRI
semantics
syntax
Journal
Neurobiology of language (Cambridge, Mass.)
ISSN: 2641-4368
Titre abrégé: Neurobiol Lang (Camb)
Pays: United States
ID NLM: 101763589
Informations de publication
Date de publication:
2023
2023
Historique:
received:
23
05
2023
accepted:
28
09
2023
medline:
25
12
2023
pubmed:
25
12
2023
entrez:
25
12
2023
Statut:
epublish
Résumé
A fundamental question in neurolinguistics concerns the brain regions involved in syntactic and semantic processing during speech comprehension, both at the lexical (word processing) and supra-lexical levels (sentence and discourse processing). To what extent are these regions separated or intertwined? To address this question, we introduce a novel approach exploiting neural language models to generate high-dimensional feature sets that separately encode semantic and syntactic information. More precisely, we train a lexical language model, GloVe, and a supra-lexical language model, GPT-2, on a text corpus from which we selectively removed either syntactic or semantic information. We then assess to what extent the features derived from these information-restricted models are still able to predict the fMRI time courses of humans listening to naturalistic text. Furthermore, to determine the windows of integration of brain regions involved in supra-lexical processing, we manipulate the size of contextual information provided to GPT-2. The analyses show that, while most brain regions involved in language comprehension are sensitive to both syntactic and semantic features, the relative magnitudes of these effects vary across these regions. Moreover, regions that are best fitted by semantic or syntactic features are more spatially dissociated in the left hemisphere than in the right one, and the right hemisphere shows sensitivity to longer contexts than the left. The novelty of our approach lies in the ability to control for the information encoded in the models' embeddings by manipulating the training set. These "information-restricted" models complement previous studies that used language models to probe the neural bases of language, and shed new light on its spatial organization.
Identifiants
pubmed: 38144237
doi: 10.1162/nol_a_00125
pii: nol_a_00125
pmc: PMC10745090
doi:
Types de publication
Journal Article
Langues
eng
Pagination
611-636Informations de copyright
© 2023 Massachusetts Institute of Technology.
Déclaration de conflit d'intérêts
Competing Interests: The authors have declared that no competing interests exist.