MULTIARMED BANDITS IN THE WORST CASE
Nicolò Cesa-Bianchi
Dept. of Information Technologies University of Milan, Italy cesa-bianchi@dti.unimi.it
We present a survey of results on a recently formulated variant of the classical (stochastic) multiarmed bandit problem in which no assumption is made on the mechanism generating the rewards. We describe randomized allocation policies for this variant and prove bounds on their regret as a function of the time horizon and the number of arms. These bounds hold for any assignment of rewards to the arms and are tight to within logarithmic factors.
Session slot T-Mo-A02: A learning approach to identification and control/Area code 3a : Modelling, Identification and Signal Processing

|