Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax, and Context.

LLM context encoding models fMRI semantics syntax

Journal

Neurobiology of language (Cambridge, Mass.)

ISSN: 2641-4368

Titre abrégé: Neurobiol Lang (Camb)

Pays: United States

ID NLM: 101763589

Informations de publication

Date de publication:
2023

Historique:

received: 23 05 2023

accepted: 28 09 2023

medline: 25 12 2023

pubmed: 25 12 2023

entrez: 25 12 2023

Statut: epublish

Résumé

A fundamental question in neurolinguistics concerns the brain regions involved in syntactic and semantic processing during speech comprehension, both at the lexical (word processing) and supra-lexical levels (sentence and discourse processing). To what extent are these regions separated or intertwined? To address this question, we introduce a novel approach exploiting neural language models to generate high-dimensional feature sets that separately encode semantic and syntactic information. More precisely, we train a lexical language model, GloVe, and a supra-lexical language model, GPT-2, on a text corpus from which we selectively removed either syntactic or semantic information. We then assess to what extent the features derived from these information-restricted models are still able to predict the fMRI time courses of humans listening to naturalistic text. Furthermore, to determine the windows of integration of brain regions involved in supra-lexical processing, we manipulate the size of contextual information provided to GPT-2. The analyses show that, while most brain regions involved in language comprehension are sensitive to both syntactic and semantic features, the relative magnitudes of these effects vary across these regions. Moreover, regions that are best fitted by semantic or syntactic features are more spatially dissociated in the left hemisphere than in the right one, and the right hemisphere shows sensitivity to longer contexts than the left. The novelty of our approach lies in the ability to control for the information encoded in the models' embeddings by manipulating the training set. These "information-restricted" models complement previous studies that used language models to probe the neural bases of language, and shed new light on its spatial organization.

Identifiants

DOI: 10.1162/nol_a_00125 PMID: 38144237 PMC: PMC10745090

pubmed: 38144237

doi: 10.1162/nol_a_00125

pii: nol_a_00125

pmc: PMC10745090

doi:

Types de publication

Journal Article

Langues

eng

Pagination

611-636

Informations de copyright

Déclaration de conflit d'intérêts

Competing Interests: The authors have declared that no competing interests exist.

Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax, and Context.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Auteurs

Alexandre Pasquiou (A)

Yair Lakretz (Y)

Bertrand Thirion (B)

Christophe Pallier (C)

Classifications MeSH