Large Language Models Demonstrate the Potential of Statistical Learning in Language.

Artificial intelligence Grammar Innateness Language acquisition Large language models Linguistic experience Statistical learning

Journal

Cognitive science
ISSN: 1551-6709
Titre abrégé: Cogn Sci
Pays: United States
ID NLM: 7708195

Informations de publication

Date de publication:
Mar 2023
Historique:
revised: 14 01 2023
received: 31 10 2022
accepted: 19 01 2023
entrez: 25 2 2023
pubmed: 26 2 2023
medline: 3 3 2023
Statut: ppublish

Résumé

To what degree can language be acquired from linguistic input alone? This question has vexed scholars for millennia and is still a major focus of debate in the cognitive science of language. The complexity of human language has hampered progress because studies of language-especially those involving computational modeling-have only been able to deal with small fragments of our linguistic skills. We suggest that the most recent generation of Large Language Models (LLMs) might finally provide the computational tools to determine empirically how much of the human language ability can be acquired from linguistic experience. LLMs are sophisticated deep learning architectures trained on vast amounts of natural language data, enabling them to perform an impressive range of linguistic tasks. We argue that, despite their clear semantic and pragmatic limitations, LLMs have already demonstrated that human-like grammatical language can be acquired without the need for a built-in grammar. Thus, while there is still much to learn about how humans acquire and use language, LLMs provide full-fledged computational models for cognitive scientists to empirically evaluate just how far statistical learning might take us in explaining the full complexity of human language.

Identifiants

pubmed: 36840975
doi: 10.1111/cogs.13256
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e13256

Informations de copyright

© 2023 Cognitive Science Society LLC.

Références

Arehalli, S., Dillon, B., & Linzen, T. (2022). Syntactic surprisal from neural models predicts, but underestimates, human processing difficulty from syntactic ambiguities. https://doi.org/10.48550/arxiv.2210.12187
Bender, E. M., Gebru, T., McMillan-Major, A., & Schmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of FAccT 2021, Canada (pp. 610-623).
BigScience Workshop. (2022). BLOOM. Hugging Face. Available at: https://huggingface.co/bigscience/bloom
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
Chomsky, N. (1959). Review of Verbal behavior by B. F. Skinner. Language, 35, 26-58. https://doi.org/10.2307/411334
Chomsky, N. (1980). Rules and representations. Cambridge, MA: MIT Press.
Chomsky, N. (1995). The minimalist program. Cambridge, MA: The MIT Press.
Chomsky, N. (2017). The language capacity: Architecture and evolution. Psychonomic Bulletin & Review, 24, 200-203. https://doi.org/10.3758/s13423-016-1078-6
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., & Fiedel, N. (2022). PaLM: Scaling language modeling with pathways. https://arxiv.org/abs/2204.02311
Christiansen, M. H., & Chater, N. (2016). Creating language: Integrating evolution, acquisition, and processing. Cambridge, MA: MIT Press.
Christiansen, M. H., & Chater, N. (2022). The language game: How improvisation created language and changed the world. New York: Basic Books.
Contreras Kallens, P., & Christiansen, M. H. (2022). Models of language and multiword expressions. Frontiers in Artificial Intelligence, 5, 781962. https://doi.org/10.3389/frai.2022.781962
Dąbrowska, E. (2015). What exactly is Universal Grammar, and has anyone seen it? Frontiers in Psychology, 6, 852. https://doi.org/10.3389/fpsyg.2015.00852
Dettmers, T., Lewis, M., Belkada, Y., & Zettlemoyer, L. (2022). LLM.int8 (): 8-bit matrix multiplication for transformers at scale. arXiv: https://arxiv.org/abs/2208.07339
Dou, Y., Forbes, M., Koncel-Kedziorski, R., Smith, N. A., & Choi, Y. (2022). Is GPT-3 text indistinguishable from human text? Scarecrow: A framework for scrutinizing machine text. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland (Vol. 1, pp. 7250-7274).
Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press.
Elman, J. L. (2005). Connectionist models of cognitive development: Where next? Trends in Cognitive Sciences, 9, 111-117.
Ettinger, A. (2020). What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics, 8, 34-48. https://aclanthology.org/2020.tacl-1.3
Futrell, R., Wilcox, E., Morita, T., Qian, P., Ballesteros, M., & Levy, R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN (Vol. 1, pp. 32-42).
Gilkerson, J., Richards, J. A., Warren, S. F., Montgomery, J. K., Greenwood, C. R., Kimbrough Oller, D., Hansen, J. H. L., & Paul, T. D. (2017). Mapping the early language environment using all-day recordings and automated analysis. American Journal of Speech-Language Pathology, 26, 248-265.
Goldberg, A. (2019). Explain me this. Princeton, NJ: Princeton University Press.
Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D. … Hasson, U. (2022a). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25, 369-380. https://doi.org/10.1038/s41593-022-01026-4
Goldstein, A., Ham, E., Nastase, S. A., Zada, Z., Grinstein-Dabush, A., Aubrey, B., Schain, M., Gazula, H., Feder, A., Doyle, W., Devore, S., Dugan, P., Friedman, D., Brenner, M., Hassidim, A., Devinsky, O., Flinker, A., Levy, O., & Hasson, U. (2022b). Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain. BioRxiv. https://doi.org/10.1101/2022.07.11.499562
Hosseini, E. A., Schrimpf, M. A., Zhang, Y., Bowman, S., Zaslavsky, N., & Fedorenko, E. (2022). Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training. BioRxiv. https://doi.org/10.1101/2022.07.11.499562
Jackendoff, R. (2011). What is the human language faculty? Two views. Language, 87, 586-624.
Jackendoff, R., & Audring, J. (2019). The Parallel Architecture. In A. Kertész, E. Moravcsik, & C. Rákosi (Eds.), Current approaches to syntax: A comparative handbook (pp. 215-240). Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110540253-008
Lieven, E. (2014). First language development: A usage-based perspective on past and current research. Journal of Child Language, 41, 48-63. https://doi.org/10.1017/S0305000914000282
Marcus, G. F. (2022a). Deep learning is hitting a wall. Nautilus. Available at: https://nautil.us/deep-learning-is-hitting-a-wall-238440/. Accessed October 26, 2022.
Marcus, G. F. (2022b). Noam Chomsky and GPT-3 [Blog Post]. The road to AI we can trust. Available at: https://garymarcus.substack.com/p/noam-chomsky-and-gpt-3. Accessed October 26, 2022.
Marcus, G. F., & Davis, E. (2020). August 22nd). GPT-3, Bloviator: OpenAI's language generator has no idea what it's talking about [Blog Post]. MIT Technology Review. Available at: https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/. Accessed October 26, 2022.
McClelland, J. L., Hill, F., Rudolph, M., Baldridge, J., & Schütze, H. (2020). Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models. Proceedings of the National Academy of Sciences, 117, 25966-25974.
Pandia, L., & Ettinger, A. (2021). Sorting through the noise: Testing robustness of information processing in pre-trained language models. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic (pp. 1583-1596). https://aclanthology.org/2021.emnlp-main.119
Pinker, S. (1994). The language instinct: The new science of language and mind. William Morrow and Company.
Pinker, S. (2022). Pinker's initial salvo. Shtetl-Optimized: The Blog of Scott Aaronson. Available at: https://scottaaronson.blog/?p=6524. Accessed October 26, 2022.
Rae, J. W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., Rutherford, E., Hennigan, T., Menick, J., Cassirer, A., Powell, R., van den Driessche, G., Hendricks, L. A., Rauh, M., Huang, P. -S., … Irving, G. (2021). Scaling language models: Methods, analysis & insights from training gopher. arXiv https://doi.org/10.48550/arXiv.2112.11446
Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A Primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8, 842-866. https://doi.org/10.1162/tacl_a_00349
Rumelhart, D. E., McClelland, J. L., & Research Group, P. D. P. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press.
Skinner, B. F. (1957). Verbal behavior. Princeton, NJ: Prentice-Hall.
Tomasello, M. (2009). The usage-based theory of language acquisition. In E. L. Bavin (Ed.), The Cambridge handbook of child language (pp. 69-87). Cambridge, MA: Cambridge University Press.
Wilcox, E. G., Futrell, R., & Levy, R. (2022). Using computational models to test syntactic learnability. Linguistic Inquiry. https://doi.org/10.1162/ling_a_00491
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems 30, Long Beach, CA.
Yang, C., Crain, S., Berwick, R. C., Chomsky, N., & Bolhuis, J. J. (2017). The growth of language: Universal Grammar, experience, and principles of computation. Neuroscience & Biobehavioral Reviews, 81, 103-119.
Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P. S., Sridhar, A., Wang, T., & Zettlemoyer, L. (2022). Opt: Open pre-trained transformer language models. arXiv https://doi.org/10.48550/arXiv.2205.01068.

Auteurs

Pablo Contreras Kallens (P)

Department of Psychology, Cornell University.

Ross Deans Kristensen-McLachlan (RD)

Center for Humanities Computing, Aarhus University.
Interacting Minds Centre, Aarhus University.
School of Communication and Culture, Aarhus University.

Morten H Christiansen (MH)

Department of Psychology, Cornell University.
Interacting Minds Centre, Aarhus University.
School of Communication and Culture, Aarhus University.
Haskins Laboratories.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH