Optimizing agent behavior over long time scales by transporting value.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
19 11 2019
Historique:
received: 28 12 2018
accepted: 10 10 2019
entrez: 21 11 2019
pubmed: 21 11 2019
medline: 10 3 2020
Statut: epublish

Résumé

Humans prolifically engage in mental time travel. We dwell on past actions and experience satisfaction or regret. More than storytelling, these recollections change how we act in the future and endow us with a computationally important ability to link actions and consequences across spans of time, which helps address the problem of long-term credit assignment: the question of how to evaluate the utility of actions within a long-duration behavioral sequence. Existing approaches to credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a paradigm where agents use recall of specific memories to credit past actions, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire models in neuroscience, psychology, and behavioral economics.

Identifiants

pubmed: 31745075
doi: 10.1038/s41467-019-13073-w
pii: 10.1038/s41467-019-13073-w
pmc: PMC6864102
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

5223

Références

Nat Rev Neurosci. 2007 Sep;8(9):657-61
pubmed: 17700624
Psychol Rev. 1948 Jul;55(4):189-208
pubmed: 18870876
Neuron. 2017 Jul 19;95(2):245-258
pubmed: 28728020
Am Econ Rev. ;96(5):1449-76
pubmed: 29135208
Nature. 2016 Oct 27;538(7626):471-476
pubmed: 27732574
Trends Cogn Sci. 2019 May;23(5):408-422
pubmed: 31003893
Psychol Res. 2008 May;72(3):321-30
pubmed: 17447083
Annu Rev Psychol. 2017 Jan 3;68:101-128
pubmed: 27618944
Nature. 2015 Feb 26;518(7540):529-33
pubmed: 25719670
J Neurosci. 2007 Dec 26;27(52):14365-74
pubmed: 18160644
Neuron. 2010 Apr 15;66(1):138-48
pubmed: 20399735

Auteurs

Chia-Chun Hung (CC)

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

Timothy Lillicrap (T)

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

Josh Abramson (J)

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

Yan Wu (Y)

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

Mehdi Mirza (M)

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

Federico Carnevale (F)

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

Arun Ahuja (A)

DeepMind, 5 New Street Square, London, EC4A 3TW, UK.

Greg Wayne (G)

DeepMind, 5 New Street Square, London, EC4A 3TW, UK. gregwayne@google.com.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH