Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

UCB with An Optimal Inequality

Version 1 : Received: 23 April 2020 / Approved: 24 April 2020 / Online: 24 April 2020 (04:24:31 CEST)

How to cite: Burgess, M. UCB with An Optimal Inequality. Preprints 2020, 2020040426. https://doi.org/10.20944/preprints202004.0426.v1 Burgess, M. UCB with An Optimal Inequality. Preprints 2020, 2020040426. https://doi.org/10.20944/preprints202004.0426.v1

Abstract

Upper confidence bound multi-armed bandit algorithms (UCB) typically rely on concentration in- equalities (such as Hoeffding’s inequality) for the creation of the upper confidence bound. Intu- itively, the tighter the bound is, the more likely the respective arm is or isn’t judged appropriately for selection. Hence we derive and utilise an optimal inequality. Usually the sample mean (and sometimes the sample variance) of previous rewards are the information which are used in the bounds which drive the algorithm, but intuitively the more infor- mation that taken from the previous rewards, the tighter the bound could be. Hence our inequality explicitly considers the values of each and every past reward into the upper bound expression which drives the method. We show how this UCB method fits into the broader scope of other information theoretic UCB algorithms, but unlike them is free from assumptions about the distribution of the data, We conclude by reporting some already established regret information, and give some numerical simulations to demonstrate the method’s effectiveness.

Supplementary and Associated Material

Keywords

Bandit Algorithm; Upper Confidence Bounds; Kullback-Leibler divergence

Subject

Computer Science and Mathematics, Probability and Statistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.