GPU支持的MPPI控制器：基于广义重要性采样算法的路径积分优化

171 浏览量更新于2024-08-03 收藏 1000KB PDF 举报

在本文中，我们将深入探讨控制器算法学习中的Model Predictive Path Integral (MPPI)模型，这是一种基于广义重要性采样策略的优化方法，特别应用于GPU并行计算中。MPPI控制算法源于路径积分优化框架，该框架利用随机轨迹采样来构建最优控制方案，其核心思想是将最优控制问题的价值函数通过费曼-卡萨诺瓦引理转化为对所有可能轨迹的期望。算法的基础在于将动态系统视为一个带有随机噪声的扩散过程，其中包含一个漂移项和一个扩散项。传统的方法通常假设这些参数固定，但作者提出的广义重要性采样策略允许动态调整这些参数，这对于控制算法的性能提升至关重要。这种灵活性使得MPPI算法能够适应不同环境和任务需求，提高了控制策略的适应性和效率。与传统的差分动态规划（DDP）模型预测控制版本相比，MPPI算法展示了显著的优势。在模拟实验中，作者进行了详细的性能对比，旨在验证MPPI算法在解决实际控制问题时的优越性，如路径规划、避障以及实时决策等方面。通过GPU的并行计算能力，MPPI能够处理大规模状态空间和采样，从而实现实时且高效的控制决策。此外，文章可能还涵盖了以下内容： 1. **算法原理**：介绍了如何通过数值积分和蒙特卡洛方法来近似路径积分，以及如何使用重要性采样来减少采样偏差。 2. **优势分析**：讨论了MPPI相较于其他方法（如DDP）在处理高维状态空间、非线性系统动态以及不确定性方面的优势。 3. **实施细节**：包括采样策略的选择、目标函数设计、以及如何在GPU上实现并行优化过程的描述。 4. **实验结果**：展示了具体的应用案例，比如在复杂环境中的移动机器人导航，或者工业自动化中的路径跟踪任务，以及相应的性能提升数据。 5. **未来研究方向**：可能涉及如何进一步改进采样效率、降低计算成本，以及如何将MPPI扩展到更复杂的控制场景。本文提供了深入理解MPPI模型预测路径积分控制算法的关键洞察，强调了重要性采样在优化算法中的作用，并展示了其在实际应用中的潜在价值。通过对比和实验，展示了其在现代信息技术环境下的可行性和竞争优势。对于任何从事控制理论、机器人技术或AI领域的研究人员和工程师来说，这篇文章都是一个不可或缺的学习资料。

Model Predictive Path Integral Control using Covariance Variable

Importance Sampling

Grady Williams

, Andrew Aldrich

, and Evangelos A. Theodorou

Abstract— In this paper we develop a Model Predictive Path

Integral (MPPI) control algorithm based on a generalized

importance sampling scheme and perform parallel optimization

via sampling using a Graphics Processing Unit (GPU). The

proposed generalized importance sampling scheme allows for

changes in the drift and diffusion terms of stochastic diffusion

processes and plays a signiﬁcant role in the performance of the

model predictive control algorithm. We compare the proposed

algorithm in simulation with a model predictive control version

of differential dynamic programming.

I. INTRODUCTION

The path integral optimal control framework [7], [15],

[16] provides a mathematically sound methodology for de-

veloping optimal control algorithms based on stochastic

sampling of trajectories. The key idea in this framework is

that the value function for the optimal control problem is

transformed using the Feynman-Kac lemma [2], [8] into an

expectation over all possible trajectories, which is known

as a path integral. This transformation allows stochastic

optimal control problems to be solved with a Monte-Carlo

approximation using forward sampling of stochastic diffusion

processes.

There have been a variety of algorithms developed in the

path integral control setting. The most straight-forward appli-

cation of path integral control is when the iterative feedback

control law suggested in [15] is implemented in its open

loop formulation. This requires that sampling takes place

only from the initial state of the optimal control problem.

A more effective approach is to use the path integral control

framework to ﬁnd the parameters of a feedback control

policy. This can be done by sampling in policy parameter

space, these methods are known as Policy Improvement

with Path Integrals [14]. Another approach to ﬁnding the

parameters of a policy is to attempt to directly sample from

the optimal distribution deﬁned by the value function [3].

Other methods along similar threads of research include [10],

[17].

Another way that the path integral control framework

can be applied is in a model predictive control setting.

In this setting an open-loop control sequence is constantly

optimized in the background while the machine is simulta-

neously executing the “best guess” that the controller has.

An issue with this approach is that many trajectories must

be sampled in real-time, which is difﬁcult when the system

has complex dynamics. One way around this problem is to

This research has been supported by NSF Grant No. NRI-1426945.

The

authors are with the Autonomous Control and Decision Systems

Laboratory at the Georgia Institute of Technology, Atlanta, GA, USA. Email:

gradyrw@gatech.edu

drastically simplify the system under consideration by using

a hierarchical scheme [4], and use path integral control to

generate trajectories for a point mass which is then followed

by a low level controller. Even though this approach may be

successfull for certain applications, it is limited in the kinds

of behaviors that it can generate since it does not consider the

full non-linearity of dynamics. A more efﬁcient approach is

to take advantage of the parallel nature of sampling and use

a graphics processing unit (GPU) [19] to sample thousands

of trajectories from the nonlinear dynamics.

A major issue in the path integral control framework is

that the expectation is taken with respect to the uncontrolled

dynamics of the system. This is problematic since the proba-

bility of sampling a low cost trajectory using the uncontrolled

dynamics is typically very low. This problem becomes more

drastic when the underlying dynamics are nonlinear and

sampled trajectories can become trapped in undesirable parts

of the state space. It has previously been demonstrated

how to change the mean of the sampling distribution using

Girsanov’s theorem [15], [16], this can then be used to

develop an iterative algorithm. However, the variance of

the sampling distribution has always remained unchanged.

Although in some simple simulated scenarios changing the

variance is not necessary, in many cases the natural variance

of a system will be too low to produce useful deviations from

the current trajectory. Previous methods have either dealt

with this problem by artiﬁcially adding noise into the system

and then optimizing the noisy system [10], [14]. Or they

have simply ignored the problem entirely and sampled from

whatever distribution worked best [12], [19]. Although these

approaches can be successful, both are problematic in that

the optimization either takes place with respect to the wrong

system or the resulting algorithm ignores the theoretical basis

of path integral control.

The approach we take here generalizes these approaches in

that it enables for both the mean and variance of the sampling

distribution to be changed by the control designer, without

violating the underlying assumptions made in the path inte-

gral derivation. This enables the algorithm to converge fast

enough that it can be applied in a model predictive control

setting. After deriving the model predictive path integral

control (MPPI) algorithm, we compare it with an existing

model predictive control formulation based on differential

dynamic programming (DDP) [6], [13], [18]. DDP is one of

the most powerful techniques for trajectory optimization, it

relies on a ﬁrst or second order approximation of the dynam-

ics and a quadratic approximation of the cost along a nominal

trajectory, it then computes a second order approximation of

arXiv:1509.01149v3 [cs.SY] 28 Oct 2015

下载后可阅读完整内容，剩余8页未读，立即下载

code.

粉丝: 1012
资源: 12

GPU支持的MPPI控制器：基于广义重要性采样算法的路径积分优化

导弹模型matlab代码-deep_meta-learning_guidance_law:论文代码：“学习指南：基于深度元学习和模型预测路径积

PythonLinearNonLinearControl 是一个用 Python 实现线性和非线性控制理论的库 .zip

MATLAB_MPPI.rar

s7-200系统手册

USBCamera+v4l2+rkisp+rkmpp+opencv.rar

MPPIController:机器人控制课程项目

PythonLinearNonlinearControl:PythonLinearNonLinearControl是一个在python中实现线性和非线性控制理论的库

MATLAB MPPI控制算法案例教程与数据集

伍斯特理工学院课程项目：MPPI控制器研究及应用

Python控制理论库：线性与非线性模型实现解析

最新资源