Implicit incremental natural actor critic algorithm.
Implicit update
Incremental learning
Natural actor critic
Natural policy gradient
Reinforcement learning
Journal
Neural networks : the official journal of the International Neural Network Society
ISSN: 1879-2782
Titre abrégé: Neural Netw
Pays: United States
ID NLM: 8805018
Informations de publication
Date de publication:
Jan 2019
Jan 2019
Historique:
received:
20
12
2017
revised:
22
08
2018
accepted:
09
10
2018
pubmed:
9
11
2018
medline:
10
1
2019
entrez:
9
11
2018
Statut:
ppublish
Résumé
Natural policy gradient (NPG) methods are promising approaches to finding locally optimal policy parameters. The NPG approach works well in optimizing complex policies with high-dimensional parameters, and the effectiveness of NPG methods has been demonstrated in many fields. However, the incremental estimation of the NPG is computationally unstable owing to its high sensitivity to the step-sizes values, especially to the one used to update the estimate of NPG. In this study, we propose a new incremental and stable algorithm for the NPG estimation. We call the proposed algorithm the implicit incremental natural actor critic (I2NAC), and it is based on the idea of the implicit update. The convergence analysis for I2NAC is provided. Theoretical analysis results indicate the stability of I2NAC and the instability of conventional incremental NPG methods. Numerical experiments were performed, and the results show that I2NAC is less sensitive to the values of the meta-parameters, including the step-size for the NPG update, compared to the existing incremental NPG method.
Identifiants
pubmed: 30408692
pii: S0893-6080(18)30292-2
doi: 10.1016/j.neunet.2018.10.007
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
103-112Informations de copyright
Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.