Large Language Models Demonstrate the Potential of Statistical Learning in Language.

Humans Language Linguistics Language Development Semantics Cognition

Artificial intelligence Grammar Innateness Language acquisition Large language models Linguistic experience Statistical learning

Journal

Cognitive science

ISSN: 1551-6709

Titre abrégé: Cogn Sci

Pays: United States

ID NLM: 7708195

Informations de publication

Date de publication:
Mar 2023

Historique:

revised: 14 01 2023

received: 31 10 2022

accepted: 19 01 2023

entrez: 25 2 2023

pubmed: 26 2 2023

medline: 3 3 2023

Statut: ppublish

Résumé

To what degree can language be acquired from linguistic input alone? This question has vexed scholars for millennia and is still a major focus of debate in the cognitive science of language. The complexity of human language has hampered progress because studies of language-especially those involving computational modeling-have only been able to deal with small fragments of our linguistic skills. We suggest that the most recent generation of Large Language Models (LLMs) might finally provide the computational tools to determine empirically how much of the human language ability can be acquired from linguistic experience. LLMs are sophisticated deep learning architectures trained on vast amounts of natural language data, enabling them to perform an impressive range of linguistic tasks. We argue that, despite their clear semantic and pragmatic limitations, LLMs have already demonstrated that human-like grammatical language can be acquired without the need for a built-in grammar. Thus, while there is still much to learn about how humans acquire and use language, LLMs provide full-fledged computational models for cognitive scientists to empirically evaluate just how far statistical learning might take us in explaining the full complexity of human language.

Identifiants

DOI: 10.1111/cogs.13256 PMID: 36840975

pubmed: 36840975

doi: 10.1111/cogs.13256

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

e13256

Informations de copyright

Références

Arehalli, S., Dillon, B., & Linzen, T. (2022). Syntactic surprisal from neural models predicts, but underestimates, human processing difficulty from syntactic ambiguities. https://doi.org/10.48550/arxiv.2210.12187

Bender, E. M., Gebru, T., McMillan-Major, A., & Schmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of FAccT 2021, Canada (pp. 610-623).

BigScience Workshop. (2022). BLOOM. Hugging Face. Available at: https://huggingface.co/bigscience/bloom

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.

Chomsky, N. (1959). Review of Verbal behavior by B. F. Skinner. Language, 35, 26-58. https://doi.org/10.2307/411334

Chomsky, N. (1980). Rules and representations. Cambridge, MA: MIT Press.

Chomsky, N. (1995). The minimalist program. Cambridge, MA: The MIT Press.

Chomsky, N. (2017). The language capacity: Architecture and evolution. Psychonomic Bulletin & Review, 24, 200-203. https://doi.org/10.3758/s13423-016-1078-6

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., & Fiedel, N. (2022). PaLM: Scaling language modeling with pathways. https://arxiv.org/abs/2204.02311

Christiansen, M. H., & Chater, N. (2016). Creating language: Integrating evolution, acquisition, and processing. Cambridge, MA: MIT Press.

Christiansen, M. H., & Chater, N. (2022). The language game: How improvisation created language and changed the world. New York: Basic Books.

Contreras Kallens, P., & Christiansen, M. H. (2022). Models of language and multiword expressions. Frontiers in Artificial Intelligence, 5, 781962. https://doi.org/10.3389/frai.2022.781962

Dąbrowska, E. (2015). What exactly is Universal Grammar, and has anyone seen it? Frontiers in Psychology, 6, 852. https://doi.org/10.3389/fpsyg.2015.00852

Dettmers, T., Lewis, M., Belkada, Y., & Zettlemoyer, L. (2022). LLM.int8 (): 8-bit matrix multiplication for transformers at scale. arXiv: https://arxiv.org/abs/2208.07339

Dou, Y., Forbes, M., Koncel-Kedziorski, R., Smith, N. A., & Choi, Y. (2022). Is GPT-3 text indistinguishable from human text? Scarecrow: A framework for scrutinizing machine text. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland (Vol. 1, pp. 7250-7274).

Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press.

Elman, J. L. (2005). Connectionist models of cognitive development: Where next? Trends in Cognitive Sciences, 9, 111-117.

Ettinger, A. (2020). What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics, 8, 34-48. https://aclanthology.org/2020.tacl-1.3

Futrell, R., Wilcox, E., Morita, T., Qian, P., Ballesteros, M., & Levy, R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN (Vol. 1, pp. 32-42).

Gilkerson, J., Richards, J. A., Warren, S. F., Montgomery, J. K., Greenwood, C. R., Kimbrough Oller, D., Hansen, J. H. L., & Paul, T. D. (2017). Mapping the early language environment using all-day recordings and automated analysis. American Journal of Speech-Language Pathology, 26, 248-265.

Goldberg, A. (2019). Explain me this. Princeton, NJ: Princeton University Press.

Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D. … Hasson, U. (2022a). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25, 369-380. https://doi.org/10.1038/s41593-022-01026-4

Goldstein, A., Ham, E., Nastase, S. A., Zada, Z., Grinstein-Dabush, A., Aubrey, B., Schain, M., Gazula, H., Feder, A., Doyle, W., Devore, S., Dugan, P., Friedman, D., Brenner, M., Hassidim, A., Devinsky, O., Flinker, A., Levy, O., & Hasson, U. (2022b). Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain. BioRxiv. https://doi.org/10.1101/2022.07.11.499562

Hosseini, E. A., Schrimpf, M. A., Zhang, Y., Bowman, S., Zaslavsky, N., & Fedorenko, E. (2022). Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training. BioRxiv. https://doi.org/10.1101/2022.07.11.499562

Jackendoff, R. (2011). What is the human language faculty? Two views. Language, 87, 586-624.

Jackendoff, R., & Audring, J. (2019). The Parallel Architecture. In A. Kertész, E. Moravcsik, & C. Rákosi (Eds.), Current approaches to syntax: A comparative handbook (pp. 215-240). Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110540253-008

Lieven, E. (2014). First language development: A usage-based perspective on past and current research. Journal of Child Language, 41, 48-63. https://doi.org/10.1017/S0305000914000282

Marcus, G. F. (2022a). Deep learning is hitting a wall. Nautilus. Available at: https://nautil.us/deep-learning-is-hitting-a-wall-238440/. Accessed October 26, 2022.

Marcus, G. F. (2022b). Noam Chomsky and GPT-3 [Blog Post]. The road to AI we can trust. Available at: https://garymarcus.substack.com/p/noam-chomsky-and-gpt-3. Accessed October 26, 2022.

Marcus, G. F., & Davis, E. (2020). August 22nd). GPT-3, Bloviator: OpenAI's language generator has no idea what it's talking about [Blog Post]. MIT Technology Review. Available at: https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/. Accessed October 26, 2022.

McClelland, J. L., Hill, F., Rudolph, M., Baldridge, J., & Schütze, H. (2020). Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models. Proceedings of the National Academy of Sciences, 117, 25966-25974.

Pandia, L., & Ettinger, A. (2021). Sorting through the noise: Testing robustness of information processing in pre-trained language models. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic (pp. 1583-1596). https://aclanthology.org/2021.emnlp-main.119

Pinker, S. (1994). The language instinct: The new science of language and mind. William Morrow and Company.

Pinker, S. (2022). Pinker's initial salvo. Shtetl-Optimized: The Blog of Scott Aaronson. Available at: https://scottaaronson.blog/?p=6524. Accessed October 26, 2022.

Rae, J. W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., Rutherford, E., Hennigan, T., Menick, J., Cassirer, A., Powell, R., van den Driessche, G., Hendricks, L. A., Rauh, M., Huang, P. -S., … Irving, G. (2021). Scaling language models: Methods, analysis & insights from training gopher. arXiv https://doi.org/10.48550/arXiv.2112.11446

Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A Primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8, 842-866. https://doi.org/10.1162/tacl_a_00349

Rumelhart, D. E., McClelland, J. L., & Research Group, P. D. P. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press.

Skinner, B. F. (1957). Verbal behavior. Princeton, NJ: Prentice-Hall.

Tomasello, M. (2009). The usage-based theory of language acquisition. In E. L. Bavin (Ed.), The Cambridge handbook of child language (pp. 69-87). Cambridge, MA: Cambridge University Press.

Wilcox, E. G., Futrell, R., & Levy, R. (2022). Using computational models to test syntactic learnability. Linguistic Inquiry. https://doi.org/10.1162/ling_a_00491

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems 30, Long Beach, CA.

Yang, C., Crain, S., Berwick, R. C., Chomsky, N., & Bolhuis, J. J. (2017). The growth of language: Universal Grammar, experience, and principles of computation. Neuroscience & Biobehavioral Reviews, 81, 103-119.

Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P. S., Sridhar, A., Wang, T., & Zettlemoyer, L. (2022). Opt: Open pre-trained transformer language models. arXiv https://doi.org/10.48550/arXiv.2205.01068.

Large Language Models Demonstrate the Potential of Statistical Learning in Language.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Pablo Contreras Kallens (P)

Ross Deans Kristensen-McLachlan (RD)

Morten H Christiansen (MH)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH