Robust Inverse Q-Learning for Continuous-Time Linear Systems in Adversarial Environments.


Journal

IEEE transactions on cybernetics
ISSN: 2168-2275
Titre abrégé: IEEE Trans Cybern
Pays: United States
ID NLM: 101609393

Informations de publication

Date de publication:
Dec 2022
Historique:
pubmed: 18 8 2021
medline: 23 11 2022
entrez: 17 8 2021
Statut: ppublish

Résumé

This article proposes robust inverse Q -learning algorithms for a learner to mimic an expert's states and control inputs in the imitation learning problem. These two agents have different adversarial disturbances. To do the imitation, the learner must reconstruct the unknown expert cost function. The learner only observes the expert's control inputs and uses inverse Q -learning algorithms to reconstruct the unknown expert cost function. The inverse Q -learning algorithms are robust in that they are independent of the system model and allow for the different cost function parameters and disturbances between two agents. We first propose an offline inverse Q -learning algorithm which consists of two iterative learning loops: 1) an inner Q -learning iteration loop and 2) an outer iteration loop based on inverse optimal control. Then, based on this offline algorithm, we further develop an online inverse Q -learning algorithm such that the learner mimics the expert behaviors online with the real-time observation of the expert control inputs. This online computational method has four functional approximators: a critic approximator, two actor approximators, and a state-reward neural network (NN). It simultaneously approximates the parameters of Q -function and the learner state reward online. Convergence and stability proofs are rigorously studied to guarantee the algorithm performance.

Identifiants

pubmed: 34403352
doi: 10.1109/TCYB.2021.3100749
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

13083-13095

Auteurs

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Algorithms Software Artificial Intelligence Computer Simulation

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH