Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Extension to Learning Task Events.

dynamic state space episode-dependent learning history-in-episode architecture reinforcement learning target search task

Journal

Frontiers in computational neuroscience
ISSN: 1662-5188
Titre abrégé: Front Comput Neurosci
Pays: Switzerland
ID NLM: 101477956

Informations de publication

Date de publication:
2022
Historique:
received: 28 09 2021
accepted: 26 04 2022
entrez: 20 6 2022
pubmed: 21 6 2022
medline: 21 6 2022
Statut: epublish

Résumé

Learning is a crucial basis for biological systems to adapt to environments. Environments include various states or episodes, and episode-dependent learning is essential in adaptation to such complex situations. Here, we developed a model for learning a two-target search task used in primate physiological experiments. In the task, the agent is required to gaze one of the four presented light spots. Two neighboring spots are served as the correct target alternately, and the correct target pair is switched after a certain number of consecutive successes. In order for the agent to obtain rewards with a high probability, it is necessary to make decisions based on the actions and results of the previous two trials. Our previous work achieved this by using a dynamic state space. However, to learn a task that includes events such as fixation to the initial central spot, the model framework should be extended. For this purpose, here we propose a "history-in-episode architecture." Specifically, we divide states into episodes and histories, and actions are selected based on the histories within each episode. When we compared the proposed model including the dynamic state space with the conventional SARSA method in the two-target search task, the former performed close to the theoretical optimum, while the latter never achieved target-pair switch because it had to re-learn each correct target each time. The reinforcement learning model including the proposed history-in-episode architecture and dynamic state scape enables episode-dependent learning and provides a basis for highly adaptable learning systems to complex environments.

Identifiants

pubmed: 35720772
doi: 10.3389/fncom.2022.784604
pmc: PMC9201426
doi:

Types de publication

Journal Article

Langues

eng

Pagination

784604

Informations de copyright

Copyright © 2022 Sakamoto, Yamada, Kawaguchi, Furusawa, Saito and Mushiake.

Déclaration de conflit d'intérêts

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Références

Neurosci Res. 2020 Jul;156:41-49
pubmed: 31923449
Brain Res Cogn Brain Res. 2001 Mar;11(1):165-9
pubmed: 11240119
Nature. 2017 Oct 18;550(7676):354-359
pubmed: 29052630
Science. 1992 May 1;256(5057):675-7
pubmed: 1585183
Nature. 2016 Jan 28;529(7587):484-9
pubmed: 26819042
Psychol Rev. 1967 May;74(3):151-82
pubmed: 5342881
Neurosci Biobehav Rev. 2016 Dec;71:829-848
pubmed: 27693227
J Neurosci. 2020 Jan 2;40(1):203-219
pubmed: 31719167
IEEE Trans Pattern Anal Mach Intell. 1982 May;4(5):485-92
pubmed: 21869067
J Neurophysiol. 2015 Feb 1;113(3):1001-14
pubmed: 25411455
Cereb Cortex. 2005 Oct;15(10):1535-46
pubmed: 15703260
Neural Netw. 2015 Feb;62:67-72
pubmed: 25027732
Nat Commun. 2019 Dec 20;10(1):5826
pubmed: 31862876
Nat Rev Neurosci. 2019 Jun;20(6):364-375
pubmed: 30872808
Front Comput Neurosci. 2022 Feb 04;15:784592
pubmed: 35185502
Behav Neurosci. 1992 Apr;106(2):274-85
pubmed: 1590953
Neuroscience. 1991;42(2):335-50
pubmed: 1832750
Nat Rev Neurosci. 2013 Jun;14(6):417-28
pubmed: 23635870
Cereb Cortex. 2008 Sep;18(9):2036-45
pubmed: 18252744
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):394-407
pubmed: 26353250
PLoS One. 2013 Dec 04;8(12):e80906
pubmed: 24349020
Neuron. 2006 May 18;50(4):631-41
pubmed: 16701212

Auteurs

Kazuhiro Sakamoto (K)

Department of Neuroscience, Faculty of Medicine, Tohoku Medical and Pharmaceutical University, Sendai, Japan.
Department of Physiology, Tohoku University School of Medicine, Sendai, Japan.

Hinata Yamada (H)

Department of Neuroscience, Faculty of Medicine, Tohoku Medical and Pharmaceutical University, Sendai, Japan.

Norihiko Kawaguchi (N)

Department of Physiology, Tohoku University School of Medicine, Sendai, Japan.

Yoshito Furusawa (Y)

Department of Physiology, Tohoku University School of Medicine, Sendai, Japan.

Naohiro Saito (N)

Department of Physiology, Tohoku University School of Medicine, Sendai, Japan.

Hajime Mushiake (H)

Department of Physiology, Tohoku University School of Medicine, Sendai, Japan.

Classifications MeSH