Neural Generative Models and the Parallel Architecture of Language: A Critical Review and Outlook.
Enriched composition
GPT‐3 prompting
Neural large language models
Parallel architecture
Semantic composition
Statistical learning
Syntax‐semantics interface
Journal
Topics in cognitive science
ISSN: 1756-8765
Titre abrégé: Top Cogn Sci
Pays: United States
ID NLM: 101506764
Informations de publication
Date de publication:
18 Apr 2024
18 Apr 2024
Historique:
revised:
15
03
2024
received:
31
08
2023
accepted:
21
03
2024
medline:
18
4
2024
pubmed:
18
4
2024
entrez:
18
4
2024
Statut:
aheadofprint
Résumé
According to the parallel architecture, syntactic and semantic information processing are two separate streams that interact selectively during language comprehension. While considerable effort is put into psycho- and neurolinguistics to understand the interchange of processing mechanisms in human comprehension, the nature of this interaction in recent neural Large Language Models remains elusive. In this article, we revisit influential linguistic and behavioral experiments and evaluate the ability of a large language model, GPT-3, to perform these tasks. The model can solve semantic tasks autonomously from syntactic realization in a manner that resembles human behavior. However, the outcomes present a complex and variegated picture, leaving open the question of how Language Models could learn structured conceptual representations.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Research Grants Council, University Grants Committee
ID : PolyU 15612222x
Organisme : PROCORE France/Hong Kong Joint Research Scheme
ID : F-PolyU501/21
Organisme : European Commission
Informations de copyright
© 2024 The Authors. Topics in Cognitive Science published by Wiley Periodicals LLC on behalf of Cognitive Science Society.
Références
Andreas, J. (2022). Language models as agent models. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 5769–5779). Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.
Baggio, G. (2018). Meaning in the brain. MIT Press.
Baggio, G. (2021). Compositionality in a parallel architecture for language processing. Cognitive Science, 45(5), e12949.
Bommasani, R., Davis, K., & Cardie, C. (2020). Interpreting pretrained contextulalized representations via reductions to static embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 4758–4781).
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., … & Percy, L. (2021). On the opportunities and risks of foundation models. ArXiv: 2108.07258.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert‐Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few‐shot learners. In Advances in neural information processing systems (Vol. 33, pp. 1877–1901).
Buijtelaar, L., & Pezzelle, S. (2023). A psycholinguistic analysis of BERT's representations of compounds. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 2222–2233).
Bybee, J. (2010). Language, usage and cognition. Cambridge University Press.
Chang, T. A., & Bergen, B. K. (2024). Language model behavior: A comprehensive survey. Computational Linguistics, 50, 1–58.
Cong, Y., Chersoni, E., Hsu, Y., & Lenci, A. (2023). Are language models sensitive to semantic attraction? A study on surprisal. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (pp. 141–148).
Culicover, P. W., & Jackendoff, R. (2005). Simpler syntax. Oxford University Press.
Culicover, P. W., & Jackendoff, R. (2006). The simpler syntax hypothesis. Trends in Cognitive Sciences, 10(9), 413–418.
Dankers, V., Lucas, C., & Titov, I. (2022). Can transformer be too compositional? Analysing idiom processing in neural machine translation. In Proceedings of ACL (pp. 3608–3626).
Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff‐Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness. A connectionist perspective on development. MIT Press.
Goldberg, A. E. (2019). Explain me this. Creativity, competition, and the partial productivity of constructions. Princeton University Press.
Goldberg, Y. (2019). Assessing BERT's syntactic abilities. ArXiv: 1901.05287.
Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D., Dugan, P., Melloni, L., Reichart, R., Devore, S., Flinker, A., Hasenfratz, L., Levy, O., Hassidim, A., Brenner, M., Matias, Y., Norman, K. A., Devinsky, O., & Hasson, U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), 369–380.
Glavaš, G., & Vulić, I. (2021). Is supervised syntactic parsing beneficial for language understanding tasks? An empirical investigation. In Proceedings of EACL (pp. 3090–3104)
Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model. In Proceedings of NAACL.
Hegel, G. W. F. (1979). Phenomenology of spirit (A. V. Miller, Trans.). Oxford University Press.
Hewitt, J., & Manning, C. D. (2019). A structural probe for finding syntax in word representations. In Proceedings NAACL‐HLT (pp. 4129–4138).
Hu, J., Floyd, S., Jouravlev, O., Fedorenko, E., & Gibson, E. (2023). A fine‐grained comparison of pragmatic language understanding in humans and language models. In Proceedings of ACL (pp. 4194–4213).
Jackendoff, R. (2007). A parallel architecture perspective on language processing. Brain Research, 1146, 2–22.
Jackendoff, R. (1997). The architecture of the language faculty. MIT Press.
Kauf, C., Chersoni, E., Lenci, A., Fedorenko, E., & Ivanova, A. A. (2024). Comparing Plausibility Estimates in Base and Instruction‐Tuned Large Language Models. arXiv preprint arXiv:2403.14859.
Kauf, C., Ivanova, A. A., Rambelli, G., Chersoni, E., She, J. S., Chowdhury, Z., Fedorenko, E., & Lenci, A. (2023). Event knowledge in large language models: The gap between the impossible and the unlikely. Cognitive Science, 47(11), e13386.
Kim, A., & Osterhout, L. (2005). The independence of combinatory semantic processing: Evidence from event‐related potentials. Journal of Memory and Language, 52(2), 205–225.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Lenci, A. (2023). Understanding natural language understanding systems. A critical analysis. ArXiv: 2303.04229.
Lenci, A., & Sahlgren, M. (2023). Distributional semantics. Cambridge University Press.
Levy, R. (2008). Expectation‐based syntactic comprehension. Cognition, 106(3), 1126–1177.
Li, B., Zhu, Z., Thomas, G., Rudzicz, F., & Xu, Y. (2022). Neural reality of argument structure constructions. In Proceedings of ACL (pp. 7410–7423).
Lin, Y., Yi, C. T., & Frank, R. (2019). Open Sesame: Getting inside BERT's linguistic knowledge. In Proceedings of the Second BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (pp. 241–253).
Linzen, T., & Baroni, M. (2021). Syntactic structure from deep learning. Annual Review of Linguistics, 7, 195–212.
Liu, A., Wu, Z., Michael, J., Suhr, A., West, P., Koller, A., Swayamdipta, S., Smith, N. A., & Choi, Y. (2023). We're afraid language models aren't modeling ambiguity. In Proceedings of EMNLP 2023.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. ArXiv: 1907.11692.
Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2023). Dissociating language and thought in large language models: A cognitive perspective. ArXiv: 2301.06627.
McCoy, R. T., Yao, S., Friedman, D., Hardy, M., & Griffiths, T. L. (2023). Embers of autoregression: Understanding large language models through the problem they are trained to solve. ArXiv: 2309.13638.
McShane, M. J. (2005). A theory of ellipsis. Oxford University Press.
Michaelov, J., & Bergen, B. (2022). The more human‐like the language model, the more surprisal is the best predictor of N400 amplitude. In NeurIPS 2022 Workshop on Information‐Theoretic Principles in Cognitive Systems.
Michalon, O., & Baggio, G. (2019). Meaning‐driven syntactic predictions in a parallel processing architecture: Theory and algorithmic modeling of ERP effects. Neuropsychologia, 131, 171–183.
Miletić, F., & im Walde, S. S. (2023). A systematic search for compound semantics in pretrained BERT architectures. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 1499–1512).
Mollica, F., Siegelman, M., Diachek, E., Piantadosi, S. T., Mineroff, Z., Futrell, R., Keanm, H., Qian, P., & Fedorenko, E. (2020). Composition is the core driver of the language‐selective network. Neurobiology of Language, 1(1), 104–134.
Nedumpozhimana, V., & Kelleher, J. (2021). Finding BERT's idiomatic key. In Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021) (pp. 57–62).
Ormerod, M., Martínez del Rincón, J., & Devereux, B. (2024). How is a “kitchen chair” like a “farm horse”? Exploring the representation of noun‐noun compound semantics in transformer‐based language models. Computational Linguistics, 1–33.
Pedinotti, P., Rambelli, G., Chersoni, E., Santus, E., Lenci, A., & Blache, P. (2021). Did the cat drink the coffee? Challenging transformers with generalized event knowledge. In Proceedings *SEM 2021 (pp. 1–11).
Pezzelle, S. (2023). Dealing with semantic underspecification in multimodal NLP. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12098–12112). Toronto, Canada: Association for Computational Linguistics.
Piantadosi, S. (2023). Modern language models refute Chomsky's approach to language. Lingbuzz, 7180.
Prange, J., Schneider, N. & Kong, L. (2022). Linguistic Frameworks Go Toe‐to‐Toe at Neuro‐Symbolic Language Modeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4375–4391). Seattle, United States: Association for Computational Linguistics.
Pustejovsky, J. (1995). The generative lexicon. MIT Press.
Rambelli, G., Chersoni, E., Lenci, A., Blache, P., & Huang, C. R. (2020). Comparing probabilistic, distributional and transformer‐based models on logical metonymy interpretation. In Proceedings of AACL‐IJCNLP (pp. 224–234).
Rambelli, G., Chersoni, E., Senaldi, M. S. G., Blache, P, & Lenci, A. (2023). Are frequent phrases directly retrieved like idioms? An investigation with self‐paced reading and language models. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023) (pp. 87–98).
Ruis, L. E., Khan, A., Biderman, S., Hooker, S., Rocktäschel, T., & Grefenstette, E. (2022). Large language models are not zero‐shot communicators.
Schlangen, D. (2022). Norm participation grounds language. In Proceedings of the 2022 CLASP Conference on (Dis)embodiment (pp. 62–69). Gothenburg, Sweden: Association for Computational Linguistics.
Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., McCoy, R. T., Kim, N., Van Durme, B., Bowman, S. R., Das, D., & Pavlick, E. (2019). What do you learn from context? Probing for sentence structure in contextualized word representations. In Proceedings of ICLR 2019.
Testa, D., Chersoni, E., & Lenci, A. (2023). We Understand Elliptical Sentences, and Language Models should Too: A New Dataset for Studying Ellipsis and its Interaction with Thematic Fit. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 3340–3353). Toronto, Canada: Association for Computational Linguistics.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems.
Vulić, I., Ponti, E. M., Litschko, R., Glavaš, G., & Korhonen, A. (2020). Probing pretrained language models for lexical semantics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 7222–7240).
Warstadt, A., Parrish, A., Liu, H., Mohananey, A., Peng, W., Wang, S. F., & Bowman, S. R. (2020). BLiMP: The benchmark of linguistic minimal pairs for English. Transactions of the Association for Computational Linguistics, 8, 377–392.
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent abilities of large language models. ArXiv: 2206.07682.