Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training.

ANN-neural data alignment artificial neural network development human behavior language network

Journal

Neurobiology of language (Cambridge, Mass.)
ISSN: 2641-4368
Titre abrégé: Neurobiol Lang (Camb)
Pays: United States
ID NLM: 101763589

Informations de publication

Date de publication:
2024
Historique:
received: 29 03 2023
accepted: 09 01 2024
medline: 22 4 2024
pubmed: 22 4 2024
entrez: 22 4 2024
Statut: epublish

Résumé

Artificial neural networks have emerged as computationally plausible models of human language processing. A major criticism of these models is that the amount of training data they receive far exceeds that of humans during language learning. Here, we use two complementary approaches to ask how the models' ability to capture human fMRI responses to sentences is affected by the amount of training data. First, we evaluate GPT-2 models trained on 1 million, 10 million, 100 million, or 1 billion words against an fMRI benchmark. We consider the 100-million-word model to be developmentally plausible in terms of the amount of training data given that this amount is similar to what children are estimated to be exposed to during the first 10 years of life. Second, we test the performance of a GPT-2 model trained on a 9-billion-token dataset to reach state-of-the-art next-word prediction performance on the human benchmark at different stages during training. Across both approaches, we find that (i) the models trained on a developmentally plausible amount of data already achieve near-maximal performance in capturing fMRI responses to sentences. Further, (ii) lower perplexity-a measure of next-word prediction performance-is associated with stronger alignment with human data, suggesting that models that have received enough training to achieve sufficiently high next-word prediction performance also acquire representations of sentences that are predictive of human fMRI responses. In tandem, these findings establish that although

Identifiants

pubmed: 38645622
doi: 10.1162/nol_a_00137
pii: nol_a_00137
pmc: PMC11025646
doi:

Types de publication

Journal Article

Langues

eng

Pagination

43-63

Informations de copyright

© 2024 Massachusetts Institute of Technology.

Déclaration de conflit d'intérêts

Competing Interests: The authors have declared that no competing interests exist.

Auteurs

Eghbal A Hosseini (EA)

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.

Martin Schrimpf (M)

The MIT Quest for Intelligence Initiative, Cambridge, MA, USA.
Swiss Federal Institute of Technology, Lausanne, Switzerland.

Yian Zhang (Y)

Computer Science Department, Stanford University, Stanford, CA, USA.

Samuel Bowman (S)

Center for Data Science, New York University, New York, NY, USA.
Department of Linguistics, New York University, New York, NY, USA.
Department of Computer Science, New York University, New York, NY, USA.

Noga Zaslavsky (N)

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center, Massachusetts Institute of Technology, Cambridge, MA, USA.
Department of Language Science, University of California, Irvine, CA, USA.

Evelina Fedorenko (E)

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
The MIT Quest for Intelligence Initiative, Cambridge, MA, USA.
Speech and Hearing Bioscience and Technology Program, Harvard University, Boston, MA, USA.

Classifications MeSH