Neural Generative Models and the Parallel Architecture of Language: A Critical Review and Outlook.

Enriched composition GPT‐3 prompting Neural large language models Parallel architecture Semantic composition Statistical learning Syntax‐semantics interface

Journal

Topics in cognitive science

ISSN: 1756-8765

Titre abrégé: Top Cogn Sci

Pays: United States

ID NLM: 101506764

Informations de publication

Date de publication:
18 Apr 2024

Historique:

revised: 15 03 2024

received: 31 08 2023

accepted: 21 03 2024

medline: 18 4 2024

pubmed: 18 4 2024

entrez: 18 4 2024

Statut: aheadofprint

Résumé

According to the parallel architecture, syntactic and semantic information processing are two separate streams that interact selectively during language comprehension. While considerable effort is put into psycho- and neurolinguistics to understand the interchange of processing mechanisms in human comprehension, the nature of this interaction in recent neural Large Language Models remains elusive. In this article, we revisit influential linguistic and behavioral experiments and evaluate the ability of a large language model, GPT-3, to perform these tasks. The model can solve semantic tasks autonomously from syntactic realization in a manner that resembles human behavior. However, the outcomes present a complex and variegated picture, leaving open the question of how Language Models could learn structured conceptual representations.

Identifiants

DOI: 10.1111/tops.12733 PMID: 38635667

pubmed: 38635667

doi: 10.1111/tops.12733

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : Research Grants Council, University Grants Committee

ID : PolyU 15612222x

Organisme : PROCORE France/Hong Kong Joint Research Scheme

ID : F-PolyU501/21

Organisme : European Commission

Informations de copyright

Références

Andreas, J. (2022). Language models as agent models. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 5769–5779). Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.

Baggio, G. (2018). Meaning in the brain. MIT Press.

Baggio, G. (2021). Compositionality in a parallel architecture for language processing. Cognitive Science, 45(5), e12949.

Bommasani, R., Davis, K., & Cardie, C. (2020). Interpreting pretrained contextulalized representations via reductions to static embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 4758–4781).

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., … & Percy, L. (2021). On the opportunities and risks of foundation models. ArXiv: 2108.07258.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert‐Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few‐shot learners. In Advances in neural information processing systems (Vol. 33, pp. 1877–1901).

Buijtelaar, L., & Pezzelle, S. (2023). A psycholinguistic analysis of BERT's representations of compounds. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 2222–2233).

Bybee, J. (2010). Language, usage and cognition. Cambridge University Press.

Chang, T. A., & Bergen, B. K. (2024). Language model behavior: A comprehensive survey. Computational Linguistics, 50, 1–58.

Cong, Y., Chersoni, E., Hsu, Y., & Lenci, A. (2023). Are language models sensitive to semantic attraction? A study on surprisal. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (pp. 141–148).

Culicover, P. W., & Jackendoff, R. (2005). Simpler syntax. Oxford University Press.

Culicover, P. W., & Jackendoff, R. (2006). The simpler syntax hypothesis. Trends in Cognitive Sciences, 10(9), 413–418.

Dankers, V., Lucas, C., & Titov, I. (2022). Can transformer be too compositional? Analysing idiom processing in neural machine translation. In Proceedings of ACL (pp. 3608–3626).

Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff‐Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness. A connectionist perspective on development. MIT Press.

Goldberg, A. E. (2019). Explain me this. Creativity, competition, and the partial productivity of constructions. Princeton University Press.

Goldberg, Y. (2019). Assessing BERT's syntactic abilities. ArXiv: 1901.05287.

Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D., Dugan, P., Melloni, L., Reichart, R., Devore, S., Flinker, A., Hasenfratz, L., Levy, O., Hassidim, A., Brenner, M., Matias, Y., Norman, K. A., Devinsky, O., & Hasson, U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), 369–380.

Glavaš, G., & Vulić, I. (2021). Is supervised syntactic parsing beneficial for language understanding tasks? An empirical investigation. In Proceedings of EACL (pp. 3090–3104)

Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model. In Proceedings of NAACL.

Hegel, G. W. F. (1979). Phenomenology of spirit (A. V. Miller, Trans.). Oxford University Press.

Hewitt, J., & Manning, C. D. (2019). A structural probe for finding syntax in word representations. In Proceedings NAACL‐HLT (pp. 4129–4138).

Hu, J., Floyd, S., Jouravlev, O., Fedorenko, E., & Gibson, E. (2023). A fine‐grained comparison of pragmatic language understanding in humans and language models. In Proceedings of ACL (pp. 4194–4213).

Jackendoff, R. (2007). A parallel architecture perspective on language processing. Brain Research, 1146, 2–22.

Jackendoff, R. (1997). The architecture of the language faculty. MIT Press.

Kauf, C., Chersoni, E., Lenci, A., Fedorenko, E., & Ivanova, A. A. (2024). Comparing Plausibility Estimates in Base and Instruction‐Tuned Large Language Models. arXiv preprint arXiv:2403.14859.

Kauf, C., Ivanova, A. A., Rambelli, G., Chersoni, E., She, J. S., Chowdhury, Z., Fedorenko, E., & Lenci, A. (2023). Event knowledge in large language models: The gap between the impossible and the unlikely. Cognitive Science, 47(11), e13386.

Kim, A., & Osterhout, L. (2005). The independence of combinatory semantic processing: Evidence from event‐related potentials. Journal of Memory and Language, 52(2), 205–225.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

Lenci, A. (2023). Understanding natural language understanding systems. A critical analysis. ArXiv: 2303.04229.

Lenci, A., & Sahlgren, M. (2023). Distributional semantics. Cambridge University Press.

Levy, R. (2008). Expectation‐based syntactic comprehension. Cognition, 106(3), 1126–1177.

Li, B., Zhu, Z., Thomas, G., Rudzicz, F., & Xu, Y. (2022). Neural reality of argument structure constructions. In Proceedings of ACL (pp. 7410–7423).

Lin, Y., Yi, C. T., & Frank, R. (2019). Open Sesame: Getting inside BERT's linguistic knowledge. In Proceedings of the Second BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (pp. 241–253).

Linzen, T., & Baroni, M. (2021). Syntactic structure from deep learning. Annual Review of Linguistics, 7, 195–212.

Liu, A., Wu, Z., Michael, J., Suhr, A., West, P., Koller, A., Swayamdipta, S., Smith, N. A., & Choi, Y. (2023). We're afraid language models aren't modeling ambiguity. In Proceedings of EMNLP 2023.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. ArXiv: 1907.11692.

Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2023). Dissociating language and thought in large language models: A cognitive perspective. ArXiv: 2301.06627.

McCoy, R. T., Yao, S., Friedman, D., Hardy, M., & Griffiths, T. L. (2023). Embers of autoregression: Understanding large language models through the problem they are trained to solve. ArXiv: 2309.13638.

McShane, M. J. (2005). A theory of ellipsis. Oxford University Press.

Michaelov, J., & Bergen, B. (2022). The more human‐like the language model, the more surprisal is the best predictor of N400 amplitude. In NeurIPS 2022 Workshop on Information‐Theoretic Principles in Cognitive Systems.

Michalon, O., & Baggio, G. (2019). Meaning‐driven syntactic predictions in a parallel processing architecture: Theory and algorithmic modeling of ERP effects. Neuropsychologia, 131, 171–183.

Miletić, F., & im Walde, S. S. (2023). A systematic search for compound semantics in pretrained BERT architectures. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 1499–1512).

Mollica, F., Siegelman, M., Diachek, E., Piantadosi, S. T., Mineroff, Z., Futrell, R., Keanm, H., Qian, P., & Fedorenko, E. (2020). Composition is the core driver of the language‐selective network. Neurobiology of Language, 1(1), 104–134.

Nedumpozhimana, V., & Kelleher, J. (2021). Finding BERT's idiomatic key. In Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021) (pp. 57–62).

Ormerod, M., Martínez del Rincón, J., & Devereux, B. (2024). How is a “kitchen chair” like a “farm horse”? Exploring the representation of noun‐noun compound semantics in transformer‐based language models. Computational Linguistics, 1–33.

Pedinotti, P., Rambelli, G., Chersoni, E., Santus, E., Lenci, A., & Blache, P. (2021). Did the cat drink the coffee? Challenging transformers with generalized event knowledge. In Proceedings *SEM 2021 (pp. 1–11).

Pezzelle, S. (2023). Dealing with semantic underspecification in multimodal NLP. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12098–12112). Toronto, Canada: Association for Computational Linguistics.

Piantadosi, S. (2023). Modern language models refute Chomsky's approach to language. Lingbuzz, 7180.

Prange, J., Schneider, N. & Kong, L. (2022). Linguistic Frameworks Go Toe‐to‐Toe at Neuro‐Symbolic Language Modeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4375–4391). Seattle, United States: Association for Computational Linguistics.

Pustejovsky, J. (1995). The generative lexicon. MIT Press.

Rambelli, G., Chersoni, E., Lenci, A., Blache, P., & Huang, C. R. (2020). Comparing probabilistic, distributional and transformer‐based models on logical metonymy interpretation. In Proceedings of AACL‐IJCNLP (pp. 224–234).

Rambelli, G., Chersoni, E., Senaldi, M. S. G., Blache, P, & Lenci, A. (2023). Are frequent phrases directly retrieved like idioms? An investigation with self‐paced reading and language models. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023) (pp. 87–98).

Ruis, L. E., Khan, A., Biderman, S., Hooker, S., Rocktäschel, T., & Grefenstette, E. (2022). Large language models are not zero‐shot communicators.

Schlangen, D. (2022). Norm participation grounds language. In Proceedings of the 2022 CLASP Conference on (Dis)embodiment (pp. 62–69). Gothenburg, Sweden: Association for Computational Linguistics.

Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., McCoy, R. T., Kim, N., Van Durme, B., Bowman, S. R., Das, D., & Pavlick, E. (2019). What do you learn from context? Probing for sentence structure in contextualized word representations. In Proceedings of ICLR 2019.

Testa, D., Chersoni, E., & Lenci, A. (2023). We Understand Elliptical Sentences, and Language Models should Too: A New Dataset for Studying Ellipsis and its Interaction with Thematic Fit. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 3340–3353). Toronto, Canada: Association for Computational Linguistics.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems.

Vulić, I., Ponti, E. M., Litschko, R., Glavaš, G., & Korhonen, A. (2020). Probing pretrained language models for lexical semantics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 7222–7240).

Warstadt, A., Parrish, A., Liu, H., Mohananey, A., Peng, W., Wang, S. F., & Bowman, S. R. (2020). BLiMP: The benchmark of linguistic minimal pairs for English. Transactions of the Association for Computational Linguistics, 8, 377–392.

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent abilities of large language models. ArXiv: 2206.07682.

Neural Generative Models and the Parallel Architecture of Language: A Critical Review and Outlook.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Subventions

Informations de copyright

Références

Auteurs

Giulia Rambelli (G)

Emmanuele Chersoni (E)

Davide Testa (D)

Philippe Blache (P)

Alessandro Lenci (A)

Classifications MeSH