ADAPTIVE VECTOR QUANTIZATION FOR REINFORCEMENT LEARNING

Dynamic programming methods are capable of solving reinforcement learning problems, in which an agent must improve its behavior through trial-and-error interactions with a dynamic environment. However, these computational algorithms suffer from the curse of dimensionality (Bellman, 1957) that the number of computational operations increases exponentially with the cardinality of the state space. In practice, this usually results in a very long training time and applications in continuous domain are far from trivial. In order to ease this problem, we propose the use of vector quantization to adaptively partition the state space based on the recent estimate of the action-value function. In particular, this state-space partitioning operation is performed incrementally to reflect the experience accumulated by the agent as it explores the underlying environment.