15th Triennial World Congress of the International Federation of Automatic Control
  Barcelona, 21–26 July 2002 
ON THE ALMOST SURE RATE OF CONVERGENCE OF TEMPORAL-DIFFERENCE LEARNING ALGORITHMS
Vladislav B. Tadic*
* Department of Electrical and Electronic Engineering
The University of Melbourne, Parkville, Victoria 3010, Australia
e-mail: v.tadic@ee.mu.oz.au

In this paper, the almost sure rate of convergence of temporal-difference learning algorithms is analyzed. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, it is shown that the almost sure rate of convergence of these algorithms is the same as the rate of convergence in the law of iterated logarithm. Therefore, the obtained results could be considered as the same law for temporal-difference learning algorithms. For the same reason, the obtained convergence rate is probably the least conservative result of this kind.
Keywords: Machine learning, temporal-difference learning, reinforcement learning, almost sure rate of convergence, law of iterated logarithm, Markov chains, uniform ergodicity, discounted cost function, linear approximation
Session slot T-Fr-M03: Estimation of Stochastic Systems/Area code 3d : Stochastic Systems