Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems.


Journal

IEEE transactions on neural networks and learning systems
ISSN: 2162-2388
Titre abrégé: IEEE Trans Neural Netw Learn Syst
Pays: United States
ID NLM: 101616214

Informations de publication

Date de publication:
May 2023
Historique:
medline: 15 9 2021
pubmed: 15 9 2021
entrez: 14 9 2021
Statut: ppublish

Résumé

In inverse reinforcement learning (RL), there are two agents. An expert target agent has a performance cost function and exhibits control and state behaviors to a learner. The learner agent does not know the expert's performance cost function but seeks to reconstruct it by observing the expert's behaviors and tries to imitate these behaviors optimally by its own response. In this article, we formulate an imitation problem where the optimal performance intent of a discrete-time (DT) expert target agent is unknown to a DT Learner agent. Using only the observed expert's behavior trajectory, the learner seeks to determine a cost function that yields the same optimal feedback gain as the expert's, and thus, imitates the optimal response of the expert. We develop an inverse RL approach with a new scheme to solve the behavior imitation problem. The approach consists of a cost function update based on an extension of RL policy iteration and inverse optimal control, and a control policy update based on optimal control. Then, under this scheme, we develop an inverse reinforcement Q-learning algorithm, which is an extension of RL Q-learning. This algorithm does not require any knowledge of agent dynamics. Proofs of stability, convergence, and optimality are given. A key property about the nonunique solution is also shown. Finally, simulation experiments are presented to show the effectiveness of the new approach.

Identifiants

pubmed: 34520364
doi: 10.1109/TNNLS.2021.3106635
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

2386-2399

Auteurs

Classifications MeSH