Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Artificial Intelligence Humans Learning Reinforcement, Psychology Video Games

Journal

Nature

ISSN: 1476-4687

Titre abrégé: Nature

Pays: England

ID NLM: 0410462

Informations de publication

Date de publication:
11 2019

Historique:

received: 30 08 2019

accepted: 10 10 2019

pubmed: 2 11 2019

medline: 9 4 2020

entrez: 1 11 2019

Statut: ppublish

Résumé

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions

Identifiants

DOI: 10.1038/s41586-019-1724-z PMID: 31666705

pubmed: 31666705

doi: 10.1038/s41586-019-1724-z

pii: 10.1038/s41586-019-1724-z

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

350-354

Références

AIIDE StarCraft AI Competition. https://www.cs.mun.ca/dchurchill/starcraftaicomp/ .

Student StarCraft AI Tournament and Ladder. https://sscaitournament.com/ .

Starcraft 2 AI ladder. https://sc2ai.net/ .

Churchill, D., Lin, Z. & Synnaeve, G. An analysis of model-based heuristic search techniques for StarCraft combat scenarios. in Artificial Intelligence and Interactive Digital Entertainment Conf. (AAAI, 2017).

Sutton, R. & Barto, A. Reinforcement Learning: An Introduction (MIT Press, 1998).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

doi: 10.1038/nature14539

Vinyals, O. et al. StarCraft II: a new challenge for reinforcement learning. Preprint at https://arxiv.org/abs/1708.04782 (2017).

Vaswani, A. et al. Attention is all you need. Adv. Neural Information Process. Syst. 30, 5998–6008 (2017).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

doi: 10.1162/neco.1997.9.8.1735

Mikolov, T., Karafiat, M., Burget, L., Cernocky, J. & Khudanpur, S. Recurrent neural network based language model. INTERSPEECH-2010 1045–1048 (2010).

Metz, L., Ibarz, J., Jaitly, N. & Davidson, J. Discrete sequential prediction of continuous actions for deep RL. Preprint at https://arxiv.org/abs/1705.05035v3 (2017).

Vinyals, O., Fortunato, M. & Jaitly, N. Pointer networks. Adv. Neural Information Process. Syst. 28, 2692–2700 (2015).

Mnih, V. et al. Asynchronous methods for deep reinforcement learning. Proc. Machine Learning Res. 48, 1928–1937 (2016).

Espeholt, L. et al. IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures. Proc. Machine Learning Res. 80, 1407–1416 (2018).

Wang, Z. et al. Sample efficient actor-critic with experience replay. Preprint at https://arxiv.org/abs/1611.01224v2 (2017).

Sutton, R. Learning to predict by the method of temporal differences. Mach. Learn. 3, 9–44 (1988).

Oh, J., Guo, Y., Singh, S. & Lee, H. Self-Imitation Learning. Proc. Machine Learning Res. 80, 3875–3884 (2018).

Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).

doi: 10.1126/science.aar6404

Balduzzi, D. et al. Open-ended learning in symmetric zero-sum games. Proc. Machine Learning Res. 97, 434–443 (2019).

Brown, G. W. Iterative solution of games by fictitious play. Act. Anal. Prod. Alloc. 13, 374–376 (1951).

Leslie, D. S. & Collins, E. J. Generalised weakened fictitious play. Games Econ. Behav. 56, 285–298 (2006).

doi: 10.1016/j.geb.2005.08.005

Heinrich, J., Lanctot, M. & Silver, D. Fictitious self-play in extensive-form games. Proc. Intl Conf. Machine Learning 32, 805–813 (2015).

Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. Preprint at https://arxiv.org/abs/1704.04760v1 (2017).

Elo, A. E. The Rating of Chessplayers, Past and Present (Arco, 2017).

Campbell, M., Hoane, A. & Hsu, F. Deep Blue. Artif. Intell. 134, 57–83 (2002).

doi: 10.1016/S0004-3702(01)00129-1

Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

doi: 10.1038/nature16961

Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

doi: 10.1038/nature14236

Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven exploration by self-supervised prediction. Proc. IEEE Conf. Computer Vision Pattern Recognition Workshops 16–17 (IEEE, 2017).

Jaderberg, M. et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019).

doi: 10.1126/science.aau6249

OpenAI OpenAI Five. https://blog.openai.com/openai-five/ (2018).

Buro, M. Real-time strategy games: a new AI research challenge. Intl Joint Conf. Artificial Intelligence 1534–1535 (2003).

Samvelyan, M. et al. The StarCraft multi-agent challenge. Intl Conf. Autonomous Agents and MultiAgent Systems 2186–2188 (2019).

Zambaldi, V. et al. Relational deep reinforcement learning. Preprint at https://arxiv.org/abs/1806.01830v2 (2018).

Usunier, N., Synnaeve, G., Lin, Z. & Chintala, S. Episodic exploration for deep deterministic policies: an application to StarCraft micromanagement tasks. Preprint at https://arxiv.org/abs/1609.02993v3 (2017).

Weber, B. G. & Mateas, M. Case-based reasoning for build order in real-time strategy games. AIIDE ’09 Proc. 5th AAAI Conf. Artificial Intelligence and Interactive Digital Entertainment 106–111 (2009).

Buro, M. ORTS: a hack-free RTS game environment. Intl Conf. Computers and Games 280–291 (Springer, 2002).

Churchill, D. SparCraft: open source StarCraft combat simulation. https://code.google.com/archive/p/sparcraft/ (2013).

Weber, B. G. AIIDE 2010 StarCraft competition. Artificial Intelligence and Interactive Digital Entertainment Conf. (2010).

Uriarte, A. & Ontañón, S. Improving Monte Carlo tree search policies in StarCraft via probabilistic models learned from replay data. Artificial Intelligence and Interactive Digital Entertainment Conf. 101–106 (2016).

Hsieh, J.-L. & Sun, C.-T. Building a player strategy model by analyzing replays of real-time strategy games. IEEE Intl Joint Conf. Neural Networks 3106–3111 (2008).

Synnaeve, G. & Bessiere, P. A Bayesian model for plan recognition in RTS games applied to StarCraft. Artificial Intelligence and Interactive Digital Entertainment Conf. 79–84 (2011).

Shao, K., Zhu, Y. & Zhao, D. StarCraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans. Emerg. Top. Comput. Intell. 3, 73–84 (2019).

Facebook CherryPi. https://torchcraft.github.io/TorchCraftAI/ .

Berkeley Overmind. https://www.icsi.berkeley.edu/icsi/news/2010/10/klein-berkeley-overmind (2010).

Justesen, N. & Risi, S. Learning macromanagement in StarCraft from replays using deep learning. IEEE Conf. Computational Intelligence and Games (CIG) 162–169 (2017).

Synnaeve, G. et al. Forward modeling for partial observation strategy games—a StarCraft defogger. Adv. Neural Information Process. Syst. 31, 10738–10748 (2018).

Farooq, S. S., Oh, I.-S., Kim, M.-J. & Kim, K. J. StarCraft AI competition report. AI Mag. 37, 102–107 (2016).

doi: 10.1609/aimag.v37i2.2657

Sun, P. et al. TStarBots: defeating the cheating level builtin AI in StarCraft II in the full game. Preprint at https://arxiv.org/abs/1809.07193v3 (2018).

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347v2 (2017).

Ibarz, B. et al. Reward learning from human preferences and demonstrations in Atari. Adv. Neural Information Process. Syst. 31, 8011–8023 (2018).

Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Overcoming exploration in reinforcement learning with demonstrations. IEEE Intl Conf. Robotics and Automation 6292–6299 (2018).

Christiano, P. F. et al. Deep reinforcement learning from human preferences. Adv. Neural Information Process. Syst. 30, 4299–4307 (2017).

Lanctot, M. et al. A unified game-theoretic approach to multiagent reinforcement learning. Adv. Neural Information Process. Syst. 30, 4190–4203 (2017).

Perez, E., Strub, F., De Vries, H., Dumoulin, V. & Courville, A. FiLM: visual reasoning with a general conditioning layer. Preprint at https://arxiv.org/abs/1709.07871v2 (2018).

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc. IEEE Conf. Computer Vision and Pattern Recognition 770–778 (2016).

Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at https://arxiv.org/abs/1503.02531v1 (2015).

Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980v9 (2014).

Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).

Rusu, A. A. et al. Policy distillation. Preprint at https://arxiv.org/abs/1511.06295 (2016).

Parisotto, E., Ba, J. & Salakhutdinov, R. Actor-mimic: deep multitask and transfer reinforcement learning. Preprint at https://arxiv.org/abs/1511.06342 (2016).

Precup, D., Sutton, R. S. & Singh, S. P. Eligibility traces for off-policy policy evaluation. ICML ’00 Proc. 17th Intl Conf. Machine Learning 759–766 (2016).

DeepMind Research on Ladder. https://starcraft2.com/en-us/news/22933138 (2019).

Vinyals, O. et al. AlphaStar: mastering the real-time strategy game StarCraft II https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii (DeepMind, 2019).