Approximate information for efficient exploration-exploitation strategies.


Journal

Physical review. E
ISSN: 2470-0053
Titre abrégé: Phys Rev E
Pays: United States
ID NLM: 101676019

Informations de publication

Date de publication:
May 2024
Historique:
received: 04 07 2023
accepted: 29 01 2024
medline: 22 6 2024
pubmed: 22 6 2024
entrez: 22 6 2024
Statut: ppublish

Résumé

This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to surpass the performance of Thompson sampling at short and intermediary times.

Identifiants

pubmed: 38907409
doi: 10.1103/PhysRevE.109.L052105
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

L052105

Auteurs

Alex Barbier-Chebbah (A)

Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France.
Épimethée, Inria, 75012 Paris, France.

Christian L Vestergaard (CL)

Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France.
Épimethée, Inria, 75012 Paris, France.

Jean-Baptiste Masson (JB)

Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, 75015 Paris, France.
Épimethée, Inria, 75012 Paris, France.

Classifications MeSH