An immediate-return reinforcement learning for the atypical Markov decision processes.
atypical Markov decision process
continuous action space
flight trajectory control
reinforcement learning
uncertain environments
Journal
Frontiers in neurorobotics
ISSN: 1662-5218
Titre abrégé: Front Neurorobot
Pays: Switzerland
ID NLM: 101477958
Informations de publication
Date de publication:
2022
2022
Historique:
received:
05
08
2022
accepted:
23
11
2022
entrez:
30
12
2022
pubmed:
31
12
2022
medline:
31
12
2022
Statut:
epublish
Résumé
The atypical Markov decision processes (MDPs) are decision-making for maximizing the immediate returns in only one state transition. Many complex dynamic problems can be regarded as the atypical MDPs, e.g., football trajectory control, approximations of the compound Poincaré maps, and parameter identification. However, existing deep reinforcement learning (RL) algorithms are designed to maximize long-term returns, causing a waste of computing resources when applied in the atypical MDPs. These existing algorithms are also limited by the estimation error of the value function, leading to a poor policy. To solve such limitations, this paper proposes an immediate-return algorithm for the atypical MDPs with continuous action space by designing an unbiased and low variance target Q-value and a simplified network framework. Then, two examples of atypical MDPs considering the uncertainty are presented to illustrate the performance of the proposed algorithm, i.e., passing the football to a moving player and chipping the football over the human wall. Compared with the existing deep RL algorithms, such as deep deterministic policy gradient and proximal policy optimization, the proposed algorithm shows significant advantages in learning efficiency, the effective rate of control, and computing resource usage.
Identifiants
pubmed: 36582302
doi: 10.3389/fnbot.2022.1012427
pmc: PMC9793950
doi:
Types de publication
Journal Article
Langues
eng
Pagination
1012427Informations de copyright
Copyright © 2022 Pan, Wen, Tan, Yin and Hu.
Déclaration de conflit d'intérêts
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Références
Nature. 2015 Feb 26;518(7540):529-33
pubmed: 25719670
J Strength Cond Res. 2017 Jun;31(6):1509-1517
pubmed: 28538299
Front Neurorobot. 2022 May 02;16:883562
pubmed: 35586262
Front Neurorobot. 2022 Jun 24;16:864380
pubmed: 35812782