在线鲁棒自适应动态规划：连续时间线性系统的双玩家零和博弈

131 浏览量更新于2024-08-31 收藏 286KB PDF 举报

"这篇研究论文是关于‘Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems’，主要探讨在线强适应动态规划算法在处理连续时间未知线性系统中的双玩家零和游戏问题，这些系统带有与外部系统状态相关的匹配不确定性。在本文中，作者提出了一种基于策略迭代（Policy Iteration, PI）框架的新颖在线算法，该算法仅包含一个迭代循环。这一方法的关键创新在于，他们提出了一个新的分析方法来证明PI策略迭代方案的收敛性。文章中给出了确保全局渐近稳定性和次优性质的充分条件，这意味着即使在不确定性的环境下，所设计的闭式回路系统也能保持稳定，并且性能接近最优。关键词包括：游戏代数里卡蒂方程（GARE），策略迭代，强适应动态规划（ADP），以及双玩家零和（ZS）游戏。一、引言双玩家零和游戏理论在控制理论和优化问题中具有广泛的应用，特别是在存在不确定性和竞争环境的情况下。本文关注的是在连续时间未知线性系统中的这类问题，其中系统的不确定性是输出和完全未知外部系统状态的函数。传统的控制方法可能无法有效处理此类复杂场景，因此需要开发新的算法来解决。二、算法设计提出的在线强适应动态规划算法利用了策略迭代的思想，通过不断更新控制策略来逐步优化系统的性能。尽管策略迭代通常涉及多个迭代步骤，但该文提出的方法只需一次迭代就能达到满意的效果，这大大减少了计算负担。三、收敛性分析为了保证算法的可行性和稳定性，作者提出了一种新的分析方法来证明策略迭代的收敛性。这种方法对于理解算法在实际应用中的行为至关重要，因为它确保了算法不会陷入不稳定的循环或者无法收敛。四、性能保证文章中给出的充分条件表明，采用该算法的闭式回路系统不仅能够实现全局渐近稳定，而且在不确定性条件下还能保持次优性能。这意味着即使系统面临各种不确定性，也能保持良好的控制效果。五、仿真验证为了展示所提方法的有效性，进行了仿真研究。仿真结果证实了该算法在处理连续时间线性系统中的双玩家零和游戏时，能够有效地应对不确定性，保证系统的稳定性和接近最优的性能。这篇论文为连续时间线性系统中的双玩家零和游戏提供了一种新的强适应动态规划解决方案，它具有在线优化和鲁棒性特点，对实际工程问题有很高的实用价值。"

3314 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 12, DECEMBER 2015

Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for

Continuous-Time Linear Systems

Yue Fu, Jun Fu, Senior Member, IEEE, and Tianyou Chai, Fellow, IEEE

Abstract— In this brief, an online robust adaptive dynamic

programming algorithm is proposed for two-player zero-sum games of

continuous-time unknown linear systems with matched uncertainties,

which are functions of system outputs and states of a completely

unknown exosystem. The online algorithm is developed using the policy

iteration (PI) scheme with only one iteration loop. A new analytical

method is proposed for convergence proof of the PI scheme. The sufﬁcient

conditions are given to guarantee globally asymptotic stability and

suboptimal property of the closed-loop system. Simulation studies are

conducted to illustrate the effectiveness of the proposed method.

Index Terms—Game algebraic Riccati equation (GARE),

policy iterations (PIs), robust adaptive dynamic program-

ming (ADP), two-player zero-sum (ZS) games.

I. I

NTRODUCTION

Two-player zero-sum (ZS) games capture two players’

behaviors, in which the success of a player in selecting strategies

depends strictly on the choices of the other player [1]. For

continuous-time linear systems, the solutions of two-player ZS

games rely on solving the generalized game algebraic Riccati

equation (GARE). By extending Kleinman’s algorithm in [2]

to continuous-time linear two-player ZS games, the GARE can

approximately be solved ofﬂine. Vamvoudakis and Lewis [1] and

Feng et al. [3] used an inner loop with iterations on the ﬁrst

control (i.e., the action of the ﬁrst player). Van der Schaft [4] and

Abu-Khalaf et al. [5], [6] designed an inner loop with iterations on

the second control (i.e., the action of the second player). Recently,

Wu and Luo [7], [8] and Luo et al. [9] presented only one iteration

loop with iterations on the two controls simultaneously. Although

these iteration algorithms can approximate the solutions of the

GARE, in general, it is still difﬁcult to solve for the cost function at

each iterative step. Moreover, they require all or partial knowledge

of the system dynamics.

Adaptive dynamic programming (ADP), widely used for solving

the optimal control problems, as in [10]–[14], is an effective approach

for solving two-player ZS problems of linear or nonlinear systems

with unknown dynamics. In [15], using the ADP, an online solution

of linear two-player ZS games without using aprioriknowledge of

any system dynamics has been proposed. In [16], the corresponding

algorithm in [15] was extended to nonlinear two-player ZS games.

The solutions proposed in [15] and [16] are applicable under the

assumption that the full knowledge of the states is available for

feedback.

Manuscript received April 18, 2014; revised June 23, 2015; accepted

July 18, 2015. Date of publication October 29, 2015; date of current version

November 16, 2015. This work was supported in part by the Natural Science

Foundation of China under Grant 61573090 and Grant 61473063 and in part

by the Research Funds for the Central Universities under Grant N130408003

and Grant N130108001. (Corresponding author: Jun Fu.)

The authors are with the State Key Laboratory of Synthetical Automation for

Process Industries, Northeastern University, Shenyang 110819, China (e-mail:

fuyue@mail.neu.edu.cn; fujuncontrol@126.com; tychai@mail.neu.edu.cn).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TNNLS.2015.2461452

In this brief, an online robust algorithm is proposed to investigate

the robustness of two-player ZS games for continuous-time linear

systems. In particular, an online robust ADP algorithm is proposed

to investigate its robust properties of the saddle point solution for

two-player ZS games of linear systems with matched uncertainties,

which are functions of both the system outputs and the signals

generated by a completely unknown exosystem. Sufﬁcient conditions

are provided to guarantee the globally asymptotic stability (GAS)

of the closed-loop system; meanwhile, the suboptimal property is

achieved. The most relevant results of this brief are given

in [17]–[20]. Compared with them, the differences of this brief are

listed as follows.

1) This brief considers a ZS game problem with two inputs acting

as two strictly competitive players and further investigates

under what conditions GAS and suboptimal properties of

the ZS game can be achieved in the presence of dynamic

uncertainties; while in [17]–[20], we consider only one input,

and moreover, the addressed stability and the suboptimality are

typical control problems.

2) To investigate the robustness of the solution of the two-player

ZS game problem, for its nominal systems, we reveal the

relationship between the input matrices and the two players’

multipliers [see (16)], which is not possessed in [17]–[20].

3) A new analytical method is proposed for the convergence

proof of the policy iteration (PI) algorithm for the nominal

system, but in [17]–[20], the convergence of the PI algorithm

is guaranteed by the known technique in [2].

Throughout this brief, R

and Z

are used to denote the sets of

non-negative real numbers and non-negative integers. Vertical bars |·|

is used to represent the Euclidean norm for vectors, or the induced

matrix norm for matrices. For any piecewise continuous function u,

||u|| denotes sup{|u(t)|, t ≥ 0}. ⊗ is used to indicate the Kronecker

product, and vec(A) is deﬁned to be vec(A) =[a

...a

],where

∈ R

are the columns of A ∈ R

n×m

. I

stands for the n×n identity

matrix.

II. P

ROBLEM DESCRIPTIONS

Consider a continuous-time linear time-invariant system

interconnected with a nonlinear exosystem, described by

˙x = Ax + B(u

+ 

(ω, y)) + D(u

+ 

(ω, y)) (1)

˙ω = g(ω, y), y = Cx (2)

where x ∈ R

is the measured component of the state, u

∈ R

and

∈ R

are the two control inputs, y ∈ R

is the system output,

A ∈ R

n×n

, B ∈ R

n×m

,andD ∈ R

n×q

are the unknown constant

matrices with (A and B) stabilizable and (A and C) observable,

ω ∈ R

is the unmeasurable part of the state with unknown order r,



(ω, y) = E(ω, y) and 

(ω, y) = F(ω, y) are the two mea-

surable outputs of the dynamic uncertainties with E and F ∈ R

m×s

being two unknown constant matrices, the unknown function

(ω, y) ∈ R

is locally Lipschitz and satisﬁed with (0, 0) = 0,

and the unknown function g : R

× R

→ R

is locally Lipschitz

and satisﬁed with g(0, 0) = 0.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38735782

粉丝: 5
资源: 979

在线鲁棒自适应动态规划：连续时间线性系统的双玩家零和博弈

汽车编程- TASKING RTOS for TriCore 使用指南

"2021阿里云智能运维大赛：内存故障预测与机器学习方法

"研究云计算在轴承转子系统振动及油膜力计算中的应用

Robust Adaptive Quadratic Tracking Control of Continuous-Time Linear Systems with Unknown Dynamics

ROBUST ADAPTIVE DYNAMIC PROGRAMMING

Characteristic model-based H_2/H_∞ robust adaptive control during the re-entry of hypersonic cruise vehicles

Robust Adaptive Iterative Learning Control for Discrete-Time Nonlinear Systems With Time-Iteration-Varying Parameters

Robust adaptive fault-tolerant control for uncertain linear systems with actuator failures

Robust-Dynamic-Block-Based-Image-Watermarking-in-_Block Based DW

Robust adaptive dynamic surface control for adjustable metal cutting system with hysteresis input

最新资源