Optimizing agent behavior over long time scales by transporting value.

Algorithms Artificial Intelligence Humans Learning / physiology Mental Processes / physiology Models, Psychological Problem Solving / physiology Reinforcement, Psychology Transfer, Psychology / physiology

Journal

Nature communications

ISSN: 2041-1723

Titre abrégé: Nat Commun

Pays: England

ID NLM: 101528555

Informations de publication

Date de publication:
19 11 2019

Historique:

received: 28 12 2018

accepted: 10 10 2019

entrez: 21 11 2019

pubmed: 21 11 2019

medline: 10 3 2020

Statut: epublish

Résumé

Humans prolifically engage in mental time travel. We dwell on past actions and experience satisfaction or regret. More than storytelling, these recollections change how we act in the future and endow us with a computationally important ability to link actions and consequences across spans of time, which helps address the problem of long-term credit assignment: the question of how to evaluate the utility of actions within a long-duration behavioral sequence. Existing approaches to credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a paradigm where agents use recall of specific memories to credit past actions, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire models in neuroscience, psychology, and behavioral economics.

Identifiants

DOI: 10.1038/s41467-019-13073-w PMID: 31745075 PMC: PMC6864102

pubmed: 31745075

doi: 10.1038/s41467-019-13073-w

pii: 10.1038/s41467-019-13073-w

pmc: PMC6864102

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

5223

Références

Nat Rev Neurosci. 2007 Sep;8(9):657-61

pubmed: 17700624

Psychol Rev. 1948 Jul;55(4):189-208

pubmed: 18870876

Neuron. 2017 Jul 19;95(2):245-258

pubmed: 28728020

Am Econ Rev. ;96(5):1449-76

pubmed: 29135208

Nature. 2016 Oct 27;538(7626):471-476

pubmed: 27732574

Trends Cogn Sci. 2019 May;23(5):408-422

pubmed: 31003893

Psychol Res. 2008 May;72(3):321-30

pubmed: 17447083

Annu Rev Psychol. 2017 Jan 3;68:101-128

pubmed: 27618944

Nature. 2015 Feb 26;518(7540):529-33

pubmed: 25719670

J Neurosci. 2007 Dec 26;27(52):14365-74

pubmed: 18160644

Neuron. 2010 Apr 15;66(1):138-48

pubmed: 20399735

Optimizing agent behavior over long time scales by transporting value.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Références

Auteurs

Chia-Chun Hung (CC)

Timothy Lillicrap (T)

Josh Abramson (J)

Yan Wu (Y)

Mehdi Mirza (M)

Federico Carnevale (F)

Arun Ahuja (A)

Greg Wayne (G)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH