15th Triennial World Congress of the International Federation of Automatic Control
  Barcelona, 21–26 July 2002 
MULTIARMED BANDITS IN THE WORST CASE
Nicolò Cesa-Bianchi
Dept. of Information Technologies
University of Milan, Italy
cesa-bianchi@dti.unimi.it

We present a survey of results on a recently formulated variant of the classical (stochastic) multiarmed bandit problem in which no assumption is made on the mechanism generating the rewards. We describe randomized allocation policies for this variant and prove bounds on their regret as a function of the time horizon and the number of arms. These bounds hold for any assignment of rewards to the arms and are tight to within logarithmic factors.
Session slot T-Mo-A02: A learning approach to identification and control/Area code 3a : Modelling, Identification and Signal Processing