Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Extension to Learning Task Events.
dynamic state space
episode-dependent learning
history-in-episode architecture
reinforcement learning
target search task
Journal
Frontiers in computational neuroscience
ISSN: 1662-5188
Titre abrégé: Front Comput Neurosci
Pays: Switzerland
ID NLM: 101477956
Informations de publication
Date de publication:
2022
2022
Historique:
received:
28
09
2021
accepted:
26
04
2022
entrez:
20
6
2022
pubmed:
21
6
2022
medline:
21
6
2022
Statut:
epublish
Résumé
Learning is a crucial basis for biological systems to adapt to environments. Environments include various states or episodes, and episode-dependent learning is essential in adaptation to such complex situations. Here, we developed a model for learning a two-target search task used in primate physiological experiments. In the task, the agent is required to gaze one of the four presented light spots. Two neighboring spots are served as the correct target alternately, and the correct target pair is switched after a certain number of consecutive successes. In order for the agent to obtain rewards with a high probability, it is necessary to make decisions based on the actions and results of the previous two trials. Our previous work achieved this by using a dynamic state space. However, to learn a task that includes events such as fixation to the initial central spot, the model framework should be extended. For this purpose, here we propose a "history-in-episode architecture." Specifically, we divide states into episodes and histories, and actions are selected based on the histories within each episode. When we compared the proposed model including the dynamic state space with the conventional SARSA method in the two-target search task, the former performed close to the theoretical optimum, while the latter never achieved target-pair switch because it had to re-learn each correct target each time. The reinforcement learning model including the proposed history-in-episode architecture and dynamic state scape enables episode-dependent learning and provides a basis for highly adaptable learning systems to complex environments.
Identifiants
pubmed: 35720772
doi: 10.3389/fncom.2022.784604
pmc: PMC9201426
doi:
Types de publication
Journal Article
Langues
eng
Pagination
784604Informations de copyright
Copyright © 2022 Sakamoto, Yamada, Kawaguchi, Furusawa, Saito and Mushiake.
Déclaration de conflit d'intérêts
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Références
Neurosci Res. 2020 Jul;156:41-49
pubmed: 31923449
Brain Res Cogn Brain Res. 2001 Mar;11(1):165-9
pubmed: 11240119
Nature. 2017 Oct 18;550(7676):354-359
pubmed: 29052630
Science. 1992 May 1;256(5057):675-7
pubmed: 1585183
Nature. 2016 Jan 28;529(7587):484-9
pubmed: 26819042
Psychol Rev. 1967 May;74(3):151-82
pubmed: 5342881
Neurosci Biobehav Rev. 2016 Dec;71:829-848
pubmed: 27693227
J Neurosci. 2020 Jan 2;40(1):203-219
pubmed: 31719167
IEEE Trans Pattern Anal Mach Intell. 1982 May;4(5):485-92
pubmed: 21869067
J Neurophysiol. 2015 Feb 1;113(3):1001-14
pubmed: 25411455
Cereb Cortex. 2005 Oct;15(10):1535-46
pubmed: 15703260
Neural Netw. 2015 Feb;62:67-72
pubmed: 25027732
Nat Commun. 2019 Dec 20;10(1):5826
pubmed: 31862876
Nat Rev Neurosci. 2019 Jun;20(6):364-375
pubmed: 30872808
Front Comput Neurosci. 2022 Feb 04;15:784592
pubmed: 35185502
Behav Neurosci. 1992 Apr;106(2):274-85
pubmed: 1590953
Neuroscience. 1991;42(2):335-50
pubmed: 1832750
Nat Rev Neurosci. 2013 Jun;14(6):417-28
pubmed: 23635870
Cereb Cortex. 2008 Sep;18(9):2036-45
pubmed: 18252744
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):394-407
pubmed: 26353250
PLoS One. 2013 Dec 04;8(12):e80906
pubmed: 24349020
Neuron. 2006 May 18;50(4):631-41
pubmed: 16701212