Combining backpropagation with Equilibrium Propagation to improve an Actor-Critic reinforcement learning framework.

Actor-Critic (AC) Equilibrium Propagation backpropagation biologically plausible reinforcement learning

Journal

Frontiers in computational neuroscience
ISSN: 1662-5188
Titre abrégé: Front Comput Neurosci
Pays: Switzerland
ID NLM: 101477956

Informations de publication

Date de publication:
2022
Historique:
received: 28 06 2022
accepted: 05 08 2022
entrez: 9 9 2022
pubmed: 10 9 2022
medline: 10 9 2022
Statut: epublish

Résumé

Backpropagation (BP) has been used to train neural networks for many years, allowing them to solve a wide variety of tasks like image classification, speech recognition, and reinforcement learning tasks. But the biological plausibility of BP as a mechanism of neural learning has been questioned. Equilibrium Propagation (EP) has been proposed as a more biologically plausible alternative and achieves comparable accuracy on the CIFAR-10 image classification task. This study proposes the first EP-based reinforcement learning architecture: an Actor-Critic architecture with the actor network trained by EP. We show that this model can solve the basic control tasks often used as benchmarks for BP-based models. Interestingly, our trained model demonstrates more consistent high-reward behavior than a comparable model trained exclusively by BP.

Identifiants

pubmed: 36082305
doi: 10.3389/fncom.2022.980613
pmc: PMC9446087
doi:

Types de publication

Journal Article

Langues

eng

Pagination

980613

Informations de copyright

Copyright © 2022 Kubo, Chalmers and Luczak.

Déclaration de conflit d'intérêts

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Références

Front Neurosci. 2008 Jul 09;2(1):86-99
pubmed: 18982111
Nat Mach Intell. 2022 Jan;4(1):62-72
pubmed: 35814496
Neuroreport. 2012 Mar 7;23(4):240-5
pubmed: 22314684
J Neurosci Methods. 2005 May 15;144(1):53-61
pubmed: 15848239
Front Neurosci. 2021 Feb 18;15:633674
pubmed: 33679315
Nature. 2016 Jan 28;529(7587):484-9
pubmed: 26819042
Front Syst Neurosci. 2022 Jan 11;15:767461
pubmed: 35087383
Phys Rev Lett. 1987 Nov 9;59(19):2229-2232
pubmed: 10035458
PLoS Biol. 2019 Nov 21;17(11):e3000516
pubmed: 31751328
J Neurosci Methods. 2004 Jun 15;136(1):77-85
pubmed: 15126048
Neural Comput. 1991 Winter;3(4):526-545
pubmed: 31167332
Science. 1994 Jul 29;265(5172):676-9
pubmed: 8036517
Neural Comput. 2019 Feb;31(2):312-329
pubmed: 30576611
Neural Netw. 2002 Jun-Jul;15(4-6):535-47
pubmed: 12371510
Commun Integr Biol. 2023 Jan 17;16(1):2163131
pubmed: 36685291
Front Comput Neurosci. 2017 May 04;11:24
pubmed: 28522969
Nature. 2015 Feb 26;518(7540):529-33
pubmed: 25719670

Auteurs

Yoshimasa Kubo (Y)

Canadian Centre for Behavioural Neuroscience, University of Lethbridge, Lethbridge, AB, Canada.

Eric Chalmers (E)

Department of Mathematics and Computing, Mount Royal University, Calgary, AB, Canada.

Artur Luczak (A)

Canadian Centre for Behavioural Neuroscience, University of Lethbridge, Lethbridge, AB, Canada.

Classifications MeSH