Role play with large language models.
Journal
Nature
ISSN: 1476-4687
Titre abrégé: Nature
Pays: England
ID NLM: 0410462
Informations de publication
Date de publication:
Nov 2023
Nov 2023
Historique:
received:
10
07
2023
accepted:
14
09
2023
medline:
17
11
2023
pubmed:
8
11
2023
entrez:
8
11
2023
Statut:
ppublish
Résumé
As dialogue agents become increasingly human-like in their performance, we must develop effective ways to describe their behaviour in high-level terms without falling into the trap of anthropomorphism. Here we foreground the concept of role play. Casting dialogue-agent behaviour in terms of role play allows us to draw on familiar folk psychological terms, without ascribing human characteristics to language models that they in fact lack. Two important cases of dialogue-agent behaviour are addressed this way, namely, (apparent) deception and (apparent) self-awareness.
Identifiants
pubmed: 37938776
doi: 10.1038/s41586-023-06647-8
pii: 10.1038/s41586-023-06647-8
doi:
Types de publication
Journal Article
Review
Langues
eng
Sous-ensembles de citation
IM
Pagination
493-498Informations de copyright
© 2023. Springer Nature Limited.
Références
Shanahan, M. Talking about large language models. Preprint at https://arxiv.org/abs/2212.03551 (2023). This paper cautions against the use of anthropomorphic terms to describe the behaviour of large language models.
Andreas, J. Language models as agent models. In Findings of the Association for Computational Linguistics: EMNLP 2022 5769–5779 (Association for Computational Linguistics, 2022). This paper hypothesizes that LLMs can be understood as modelling the beliefs, desires and (communicative) intentions of an agent, and presents preliminary evidence for this in the case of GPT-3.
Park, J. S. et al. Generative agents: interactive simulacra of human behavior. Preprint at https://arxiv.org/abs/2304.03442 (2023).
Janus. Simulators. LessWrong Online Forum https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/ (2022). This blog post introduced the idea that a large language model maintains a set of simulated characters in superposition.
Wei, J. et al. Emergent abilities of large language models. Trans. Mach. Learn. Res. https://openreview.net/forum?id=yzkSU5zdwD (2022).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Radford, A. et al. Language models are unsupervised multitask learners. Preprint at OpenAI https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (2019).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Rae, J. W. et al. Scaling language models: methods, analysis & insights from training Gopher. Preprint at https://arxiv.org/abs/2112.11446 (2021).
Chowdhery, A. et al. PaLM: scaling language modeling with pathways. Preprint at https://arxiv.org/abs/2204.02311 (2022).
Thoppilan, R. et al. LaMDA: language models for dialog applications. Preprint at https://arxiv.org/abs/2201.08239 (2022).
OpenAI. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).
Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Meta AI https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/ (2023).
Roose, K. Bing’s A.I. chat: ‘I want to be alive’. New York Times (26 February 2023); https://www.nytimes.com/2023/02/16/technology/bing-chatbot-transcript.html .
Willison, S. Bing: “I will not harm you unless you harm me first”. Simon Willison’s Weblog https://simonwillison.net/2023/Feb/15/bing/ (2023).
Ruane, E., Birhane, A. & Ventresque, A. Conversational AI: social and ethical considerations. In Proc. 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science (eds Curry, E., Keane, M. T., Ojo, A. & Salwala, D.) 104–115 (2019).
Nardo, C. Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers. LessWrong Online Forum https://www.lesswrong.com/posts/G3tuxF4X5R5BY7fut/want-to-predict-explain-control-the-output-of-gpt-4-then (2023).
Reynolds, L. & McDonell, K. Multiversal views on language models. In Joint Proc. ACM IUI 2021 Workshops (eds Glowacka, D. & Krishnamurthy, V. R.) https://ceur-ws.org/Vol-2903/IUI21WS-HAIGEN-11.pdf (2021).
Glaese, A. et al. Improving alignment of dialogue agents via targeted human judgements. Preprint at https://arxiv.org/abs/2209.14375 (2022).
Bai, Y. et al. Constitutional AI: harmlessness from AI feedback. Preprint at https://arxiv.org/abs/2212.08073 (2022).
Bender, E., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (Association for Computing Machinery, 2021).
Perez, E. et al. Discovering language model behaviors with model-written evaluations. In Findings of the Association for Computational Linguistics: ACL 2023 13387–13434 (Association for Computational Linguistics, 2023).
Perry, J. Personal Identity 2nd edn (Univ. California Press, 2008).
Schick, T. et al. Toolformer: language models can teach themselves to use tools. Preprint at https://arxiv.org/abs/2302.04761 (2023).
Yao, S. et al. ReAct: synergizing reasoning and acting in language models. In International Conference on Learning Representations (2023).
Perkowitz, S. in Hollywood Science: Movies, Science, and the End of the World 142–164 (Columbia Univ. Press, 2007).
Russell, S. & Norvig, P. Artificial Intelligence: A Modern Approach 3rd edn (Prentice Hall, 2010).
Stiennon, N. et al. Learning to summarize from human feedback. Adv. Neural Inf. Process. Syst. 33, 3008–3021 (2020).
Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744. (2022).
Casper, S. et al. Open problems and fundamental limitations of reinforcement learning from human feedback. Preprint at https://arxiv.org/abs/2307.15217 (2023).