Empirical Results on Convergence and Exploration in Approximate Policy Iteration

Authors:	Kaisare Niket, Georgia Institute of Technology, United States Lee Jong Min, Georgia Institute of Technology, United States Lee Jay H., Georgia Institute of Technology, United States
Topic:	2.3 Non-Linear Control Systems
Session:	Optimal Control in Nonlinear Systems I
Keywords:	Optimal Control, Approximate Dynamic Programming, Cost-to-go Function, Policy Iteration, Value Iteration

Abstract

In this paper, we empirically investigate the convergence properties of policy iteration applied to the optimal control of systems with continuous state and action spaces. We demonstrate that policy iteration requires lesser iterations than value iteration to converge, but requires more function evaluations to generate cost-to-go approximations in the policy evaluation step. Two different alternatives to policy evaluation, based on iteration over simulated states and simulation of improved policies are presented. We then demonstrate that the lambda-policy iteration method (0 <= lambda <= 1) is a tradeoff between value and policy iteration. Finally, the issue of exploration to expand the coverage of the state space during offline iteration is also considered.