Synergizing habits and goals with variational Bayes.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
25 May 2024
25 May 2024
Historique:
received:
21
06
2023
accepted:
06
05
2024
medline:
26
5
2024
pubmed:
26
5
2024
entrez:
25
5
2024
Statut:
epublish
Résumé
Behaving efficiently and flexibly is crucial for biological and artificial embodied agents. Behavior is generally classified into two types: habitual (fast but inflexible), and goal-directed (flexible but slow). While these two types of behaviors are typically considered to be managed by two distinct systems in the brain, recent studies have revealed a more sophisticated interplay between them. We introduce a theoretical framework using variational Bayesian theory, incorporating a Bayesian intention variable. Habitual behavior depends on the prior distribution of intention, computed from sensory context without goal-specification. In contrast, goal-directed behavior relies on the goal-conditioned posterior distribution of intention, inferred through variational free energy minimization. Assuming that an agent behaves using a synergized intention, our simulations in vision-based sensorimotor tasks explain the key properties of their interaction as observed in experiments. Our work suggests a fresh perspective on the neural mechanisms of habits and goals, shedding light on future research in decision making.
Identifiants
pubmed: 38796491
doi: 10.1038/s41467-024-48577-7
pii: 10.1038/s41467-024-48577-7
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
4461Informations de copyright
© 2024. The Author(s).
Références
Dickinson, A. & Balleine, B. Motivational control of goal-directed action. Anim. Learn. Behav. 22, 1–18 (1994).
doi: 10.3758/BF03199951
Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
pubmed: 24139036
pmcid: 3807793
doi: 10.1016/j.neuron.2013.09.007
Wood, W. & Rünger, D. Psychology of habit. Annu. Rev. Psychol. 67, 289–314 (2016).
pubmed: 26361052
doi: 10.1146/annurev-psych-122414-033417
Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
pubmed: 20510862
pmcid: 2895323
doi: 10.1016/j.neuron.2010.04.016
Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
pubmed: 24507199
pmcid: 3968946
doi: 10.1016/j.neuron.2013.11.028
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
pubmed: 16286932
doi: 10.1038/nn1560
Yin, H. H. & Knowlton, B. J. The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 7, 464–476 (2006).
pubmed: 16715055
doi: 10.1038/nrn1919
Bellman, R. A Markovian decision process. J. Math. Mech. 6, 679–684 (1957).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction, vol. 1 (MIT press Cambridge, 1998).
Botvinick, M., Wang, J. X., Dabney, W., Miller, K. J. & Kurth-Nelson, Z. Deep reinforcement learning and its neuroscientific implications. Neuron 107, 603–616 (2020).
pubmed: 32663439
doi: 10.1016/j.neuron.2020.06.014
Friston, K. J., Daunizeau, J., Kilner, J. & Kiebel, S. J. Action and behavior: a free-energy formulation. Biol. Cybern. 102, 227–260 (2010).
pubmed: 20148260
doi: 10.1007/s00422-010-0364-z
Fountas, Z., Sajid, N., Mediano, P. A. & Friston, K. Deep active inference agents using Monte-Carlo methods. Adv. Neural Inf. Process. Syst. 33, 11662–11675 (2020).
Friston, K., Kilner, J. & Harrison, L. A free energy principle for the brain. J. Physiol. Paris 100, 70–87 (2006).
pubmed: 17097864
doi: 10.1016/j.jphysparis.2006.10.001
Ahmadi, A. & Tani, J. A novel predictive-coding-inspired variational RNN model for online prediction and recognition. Neural Comput. 31, 2025–2074 (2019).
pubmed: 31525309
doi: 10.1162/neco_a_01228
Kim, D., Park, G. Y., O’ Doherty, J. P. & Lee, S. W. Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning. Nat. Commun. 10, 5738 (2019).
pubmed: 31844060
pmcid: 6915739
doi: 10.1038/s41467-019-13632-1
Liu, M., Zhu, M. & Zhang, W. Goal-conditioned reinforcement learning: Problems and solutions. Preprint at https://arxiv.org/abs/2201.08299 (2022).
Chebotar, Y. et al. Combining model-based and model-free updates for trajectory-centric reinforcement learning. In Proceedings of the International conference on machine learning, 703–711 (PMLR, 2017).
Mendonca, R., Rybkin, O., Daniilidis, K., Hafner, D. & Pathak, D. Discovering and achieving goals via world models. Adv. Neural Inf. Process. Syst. 34, 24379–24391 (2021).
Redgrave, P. et al. Goal-directed and habitual control in the basal ganglia: implications for parkinson’s disease. Nat. Rev. Neurosci. 11, 760–772 (2010).
pubmed: 20944662
pmcid: 3124757
doi: 10.1038/nrn2915
Friston, K. et al. Active inference and learning. Neurosci. Biobehav. Rev. 68, 862–879 (2016).
pubmed: 27375276
pmcid: 5167251
doi: 10.1016/j.neubiorev.2016.06.022
Schwöbel, S., Marković, D., Smolka, M. N. & Kiebel, S. J. Balancing control: a bayesian interpretation of habitual and goal-directed behavior. J. Math. Psychol. 100, 102472 (2021).
doi: 10.1016/j.jmp.2020.102472
Feher da Silva, C., Lombardi, G., Edelson, M. & Hare, T. A. Rethinking model-based and model-free influences on mental effort and striatal prediction errors. Nat. Hum. Behav. 7, 1–14 (2023).
doi: 10.1038/s41562-023-01573-1
Matsumoto, T. & Tani, J. Goal-directed planning for habituated agents by active inference using a variational recurrent neural network. Entropy 22, 564 (2020).
pubmed: 33286336
pmcid: 7517093
doi: 10.3390/e22050564
Slotine, S. B. & Siciliano, B. A general framework for managing multiple tasks in highly redundant robotic systems. In Proceeding of 5th International Conference on Advanced Robotics, 2, 1211–1216 (IEEE, 1991).
Buss, S. R. Introduction to inverse kinematics with jacobian transpose, pseudoinverse and damped least squares methods. IEEE J. Robot. Autom. 17, 16 (2004).
Fox, C. W. & Roberts, S. J. A tutorial on variational Bayesian inference. Artif. Intell. Rev. 38, 85–95 (2012).
doi: 10.1007/s10462-011-9236-8
Basten, U., Biele, G., Heekeren, H. R. & Fiebach, C. J. How the brain integrates costs and benefits during decision making. Proc. Natl Acad. Sci. 107, 21767–21772 (2010).
pubmed: 21118983
pmcid: 3003102
doi: 10.1073/pnas.0908104107
Friston, K. J., Daunizeau, J. & Kiebel, S. J. Reinforcement learning or active inference? PloS One 4, e6421 (2009).
pubmed: 19641614
pmcid: 2713351
doi: 10.1371/journal.pone.0006421
Liu, Y., Mattar, M. G., Behrens, T. E., Daw, N. D. & Dolan, R. J. Experience replay is associated with efficient nonlocal learning. Science 372, eabf1357 (2021).
pubmed: 34016753
pmcid: 7610948
doi: 10.1126/science.abf1357
Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79 (1999).
pubmed: 10195184
doi: 10.1038/4580
Huang, Y. & Rao, R. P. Predictive coding. Wiley Interdiscip. Rev. Cogn. Sci. 2, 580–593 (2011).
pubmed: 26302308
doi: 10.1002/wcs.142
Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951).
doi: 10.1214/aoms/1177729694
Tishby, N. & Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), 1–5 (IEEE, 2015).
Alemi, A. A., Fischer, I., Dillon, J. V. & Murphy, K. Deep variational information bottleneck. In Proceedings of the International Conference on Learning Representations (ICLR, 2017).
Dezfouli, A. & Balleine, B. W. Habits, action sequences and reinforcement learning. Eur. J. Neurosci. 35, 1036–1051 (2012).
pubmed: 22487034
pmcid: 3325518
doi: 10.1111/j.1460-9568.2012.08050.x
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR, 2014).
Hwang, J., Kim, J., Ahmadi, A., Choi, M. & Tani, J. Predictive coding-based deep dynamic neural network for visuomotor learning. In IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (IEEE, 2017).
Spratling, M. W. Predictive coding as a model of response properties in cortical area V1. J. Neurosci. 30, 3531–3543 (2010).
pubmed: 20203213
pmcid: 6634102
doi: 10.1523/JNEUROSCI.4911-09.2010
Friston, K. J., Rosch, R., Parr, T., Price, C. & Bowman, H. Deep temporal models and active inference. Neurosci. Biobehav. Rev. 90, 486–501 (2018).
pubmed: 29747865
pmcid: 5998386
doi: 10.1016/j.neubiorev.2018.04.004
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, 1856–1865 (PMLR, 2018).
Beyer, H.-G. & Schwefel, H.-P. Evolution strategies–a comprehensive introduction. Nat. Comput. 1, 3–52 (2002).
doi: 10.1023/A:1015059928466
Hafner, D. et al. Learning latent dynamics for planning from pixels. In Proceedings of the International Conference on Machine Learning, 2555–2565 (PMLR, 2019).
Matsumoto, T., Ohata, W., Benureau, F. C. & Tani, J. Goal-directed planning and goal understanding by extended active inference: Evaluation through simulated and physical robot experiments. Entropy 24, 469 (2022).
pubmed: 35455132
pmcid: 9026632
doi: 10.3390/e24040469
Wang, W. W., Han, D., Luo, X., Shen, Y., Ling, C., Wang, B., & Li, D. Toward open-ended embodied tasks solving. In NeurIPS 2023 Workshop on Agent Learning in Open-Endedness (NeurIPS, 2023).
Dickinson, A. Actions and habits: the development of behavioural autonomy. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 308, 67–78 (1985).
doi: 10.1098/rstb.1985.0010
O’Keefe, J. & Dostrovsky, J. The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).
pubmed: 5124915
doi: 10.1016/0006-8993(71)90358-1
Olton, D. S. Mazes, maps, and memory. Am. Psychologist 34, 583 (1979).
doi: 10.1037/0003-066X.34.7.583
Triandis, H. C. Cross-cultural social and personality psychology. Personal. Soc. Psychol. Bull. 3, 143–158 (1977).
doi: 10.1177/014616727700300202
Fermin, A. S. et al. Model-based action planning involves cortico-cerebellar and basal ganglia networks. Sci. Rep. 6, 31378 (2016).
pubmed: 27539554
pmcid: 4990901
doi: 10.1038/srep31378
Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7, e1002055 (2011).
pubmed: 21637741
pmcid: 3102758
doi: 10.1371/journal.pcbi.1002055
Hardwick, R. M., Forrence, A. D., Krakauer, J. W. & Haith, A. M. Time-dependent competition between goal-directed and habitual response preparation. Nat. Hum. Behav. 3, 1252–1262 (2019).
pubmed: 31570762
doi: 10.1038/s41562-019-0725-0
Valentin, V. V., Dickinson, A. & O’Doherty, J. P. Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27, 4019–4026 (2007).
pubmed: 17428979
pmcid: 6672546
doi: 10.1523/JNEUROSCI.0564-07.2007
Yang, R. et al. What is essential for unseen goal generalization of offline goal-conditioned RL? In Proceedings of the International Conference on Machine Learning, 39543–39571 (PMLR, 2023).
Deisenroth, M. & Rasmussen, C. E. PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the International Conference on Machine Learning, 465–472 (PMLR, 2011).
Parr, R. & Russell, S. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, vol. 10 (NeurIPS, 1997).
Konidaris, G. & Barto, A. Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in Neural Information Processing Systems 22 (NeurIPS, 2009).
Marblestone, A. H., Wayne, G. & Kording, K. P. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 10, 94 (2016).
pubmed: 27683554
pmcid: 5021692
doi: 10.3389/fncom.2016.00094
Andrychowicz, M. et al. Hindsight experience replay. In Advances in Neural Information Processing Systems, Vol. 30 (NeurIPS, 2017).
Van Boxtel, J. J. & Lu, H. A predictive coding perspective on autism spectrum disorders. Front. Psychol. 4, 19 (2013).
pubmed: 23372559
pmcid: 3556598
Mattar, M. G. & Lengyel, M. Planning in the brain. Neuron 110, 914–934 (2022).
pubmed: 35041804
doi: 10.1016/j.neuron.2021.12.018
LeCun, Y. A path towards autonomous machine intelligence. Preprint at https://openreview.net/pdf?id=BZ5a1r-kVsf (2022).
Shipp, S. Neural elements for predictive coding. Front. Psychol. 7, 1792 (2016).
pubmed: 27917138
pmcid: 5114244
doi: 10.3389/fpsyg.2016.01792
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proceedings of the International conference on machine learning, 8748–8763 (PMLR, 2021).
Chung, J. et al. A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems, 2980–2988 (NeurIPS, 2015).
Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1724–1734 (ACL, 2014).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529 (2015).
pubmed: 25719670
doi: 10.1038/nature14236
Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W. & Abbeel, P. Asymmetric actor critic for image-based robot learning. In Proceedings of the 14th Robotics: Science and Systems (RSS, 2018).
Eberhard, O., Hollenstein, J., Pinneri, C. & Martius, G. Pink noise is all you need: Colored noise exploration in deep reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR, 2023).
Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J. & Munos, R. Recurrent experience replay in distributed reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR, 2018).
Han, D. et al. Variational oracle guiding for reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR, 2022).