Model Predictive Path Integral Control using Covariance Variable
Importance Sampling
Grady Williams
1
, Andrew Aldrich
1
, and Evangelos A. Theodorou
1
Abstract— In this paper we develop a Model Predictive Path
Integral (MPPI) control algorithm based on a generalized
importance sampling scheme and perform parallel optimization
via sampling using a Graphics Processing Unit (GPU). The
proposed generalized importance sampling scheme allows for
changes in the drift and diffusion terms of stochastic diffusion
processes and plays a significant role in the performance of the
model predictive control algorithm. We compare the proposed
algorithm in simulation with a model predictive control version
of differential dynamic programming.
I. INTRODUCTION
The path integral optimal control framework [7], [15],
[16] provides a mathematically sound methodology for de-
veloping optimal control algorithms based on stochastic
sampling of trajectories. The key idea in this framework is
that the value function for the optimal control problem is
transformed using the Feynman-Kac lemma [2], [8] into an
expectation over all possible trajectories, which is known
as a path integral. This transformation allows stochastic
optimal control problems to be solved with a Monte-Carlo
approximation using forward sampling of stochastic diffusion
processes.
There have been a variety of algorithms developed in the
path integral control setting. The most straight-forward appli-
cation of path integral control is when the iterative feedback
control law suggested in [15] is implemented in its open
loop formulation. This requires that sampling takes place
only from the initial state of the optimal control problem.
A more effective approach is to use the path integral control
framework to find the parameters of a feedback control
policy. This can be done by sampling in policy parameter
space, these methods are known as Policy Improvement
with Path Integrals [14]. Another approach to finding the
parameters of a policy is to attempt to directly sample from
the optimal distribution defined by the value function [3].
Other methods along similar threads of research include [10],
[17].
Another way that the path integral control framework
can be applied is in a model predictive control setting.
In this setting an open-loop control sequence is constantly
optimized in the background while the machine is simulta-
neously executing the “best guess” that the controller has.
An issue with this approach is that many trajectories must
be sampled in real-time, which is difficult when the system
has complex dynamics. One way around this problem is to
This research has been supported by NSF Grant No. NRI-1426945.
The
1
authors are with the Autonomous Control and Decision Systems
Laboratory at the Georgia Institute of Technology, Atlanta, GA, USA. Email:
gradyrw@gatech.edu
drastically simplify the system under consideration by using
a hierarchical scheme [4], and use path integral control to
generate trajectories for a point mass which is then followed
by a low level controller. Even though this approach may be
successfull for certain applications, it is limited in the kinds
of behaviors that it can generate since it does not consider the
full non-linearity of dynamics. A more efficient approach is
to take advantage of the parallel nature of sampling and use
a graphics processing unit (GPU) [19] to sample thousands
of trajectories from the nonlinear dynamics.
A major issue in the path integral control framework is
that the expectation is taken with respect to the uncontrolled
dynamics of the system. This is problematic since the proba-
bility of sampling a low cost trajectory using the uncontrolled
dynamics is typically very low. This problem becomes more
drastic when the underlying dynamics are nonlinear and
sampled trajectories can become trapped in undesirable parts
of the state space. It has previously been demonstrated
how to change the mean of the sampling distribution using
Girsanov’s theorem [15], [16], this can then be used to
develop an iterative algorithm. However, the variance of
the sampling distribution has always remained unchanged.
Although in some simple simulated scenarios changing the
variance is not necessary, in many cases the natural variance
of a system will be too low to produce useful deviations from
the current trajectory. Previous methods have either dealt
with this problem by artificially adding noise into the system
and then optimizing the noisy system [10], [14]. Or they
have simply ignored the problem entirely and sampled from
whatever distribution worked best [12], [19]. Although these
approaches can be successful, both are problematic in that
the optimization either takes place with respect to the wrong
system or the resulting algorithm ignores the theoretical basis
of path integral control.
The approach we take here generalizes these approaches in
that it enables for both the mean and variance of the sampling
distribution to be changed by the control designer, without
violating the underlying assumptions made in the path inte-
gral derivation. This enables the algorithm to converge fast
enough that it can be applied in a model predictive control
setting. After deriving the model predictive path integral
control (MPPI) algorithm, we compare it with an existing
model predictive control formulation based on differential
dynamic programming (DDP) [6], [13], [18]. DDP is one of
the most powerful techniques for trajectory optimization, it
relies on a first or second order approximation of the dynam-
ics and a quadratic approximation of the cost along a nominal
trajectory, it then computes a second order approximation of
arXiv:1509.01149v3 [cs.SY] 28 Oct 2015