3314 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 12, DECEMBER 2015
Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for
Continuous-Time Linear Systems
Yue Fu, Jun Fu, Senior Member, IEEE, and Tianyou Chai, Fellow, IEEE
Abstract— In this brief, an online robust adaptive dynamic
programming algorithm is proposed for two-player zero-sum games of
continuous-time unknown linear systems with matched uncertainties,
which are functions of system outputs and states of a completely
unknown exosystem. The online algorithm is developed using the policy
iteration (PI) scheme with only one iteration loop. A new analytical
method is proposed for convergence proof of the PI scheme. The sufficient
conditions are given to guarantee globally asymptotic stability and
suboptimal property of the closed-loop system. Simulation studies are
conducted to illustrate the effectiveness of the proposed method.
Index Terms—Game algebraic Riccati equation (GARE),
policy iterations (PIs), robust adaptive dynamic program-
ming (ADP), two-player zero-sum (ZS) games.
I. I
NTRODUCTION
Two-player zero-sum (ZS) games capture two players’
behaviors, in which the success of a player in selecting strategies
depends strictly on the choices of the other player [1]. For
continuous-time linear systems, the solutions of two-player ZS
games rely on solving the generalized game algebraic Riccati
equation (GARE). By extending Kleinman’s algorithm in [2]
to continuous-time linear two-player ZS games, the GARE can
approximately be solved offline. Vamvoudakis and Lewis [1] and
Feng et al. [3] used an inner loop with iterations on the first
control (i.e., the action of the first player). Van der Schaft [4] and
Abu-Khalaf et al. [5], [6] designed an inner loop with iterations on
the second control (i.e., the action of the second player). Recently,
Wu and Luo [7], [8] and Luo et al. [9] presented only one iteration
loop with iterations on the two controls simultaneously. Although
these iteration algorithms can approximate the solutions of the
GARE, in general, it is still difficult to solve for the cost function at
each iterative step. Moreover, they require all or partial knowledge
of the system dynamics.
Adaptive dynamic programming (ADP), widely used for solving
the optimal control problems, as in [10]–[14], is an effective approach
for solving two-player ZS problems of linear or nonlinear systems
with unknown dynamics. In [15], using the ADP, an online solution
of linear two-player ZS games without using aprioriknowledge of
any system dynamics has been proposed. In [16], the corresponding
algorithm in [15] was extended to nonlinear two-player ZS games.
The solutions proposed in [15] and [16] are applicable under the
assumption that the full knowledge of the states is available for
feedback.
Manuscript received April 18, 2014; revised June 23, 2015; accepted
July 18, 2015. Date of publication October 29, 2015; date of current version
November 16, 2015. This work was supported in part by the Natural Science
Foundation of China under Grant 61573090 and Grant 61473063 and in part
by the Research Funds for the Central Universities under Grant N130408003
and Grant N130108001. (Corresponding author: Jun Fu.)
The authors are with the State Key Laboratory of Synthetical Automation for
Process Industries, Northeastern University, Shenyang 110819, China (e-mail:
fuyue@mail.neu.edu.cn; fujuncontrol@126.com; tychai@mail.neu.edu.cn).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2015.2461452
In this brief, an online robust algorithm is proposed to investigate
the robustness of two-player ZS games for continuous-time linear
systems. In particular, an online robust ADP algorithm is proposed
to investigate its robust properties of the saddle point solution for
two-player ZS games of linear systems with matched uncertainties,
which are functions of both the system outputs and the signals
generated by a completely unknown exosystem. Sufficient conditions
are provided to guarantee the globally asymptotic stability (GAS)
of the closed-loop system; meanwhile, the suboptimal property is
achieved. The most relevant results of this brief are given
in [17]–[20]. Compared with them, the differences of this brief are
listed as follows.
1) This brief considers a ZS game problem with two inputs acting
as two strictly competitive players and further investigates
under what conditions GAS and suboptimal properties of
the ZS game can be achieved in the presence of dynamic
uncertainties; while in [17]–[20], we consider only one input,
and moreover, the addressed stability and the suboptimality are
typical control problems.
2) To investigate the robustness of the solution of the two-player
ZS game problem, for its nominal systems, we reveal the
relationship between the input matrices and the two players’
multipliers [see (16)], which is not possessed in [17]–[20].
3) A new analytical method is proposed for the convergence
proof of the policy iteration (PI) algorithm for the nominal
system, but in [17]–[20], the convergence of the PI algorithm
is guaranteed by the known technique in [2].
Throughout this brief, R
+
and Z
+
are used to denote the sets of
non-negative real numbers and non-negative integers. Vertical bars |·|
is used to represent the Euclidean norm for vectors, or the induced
matrix norm for matrices. For any piecewise continuous function u,
||u|| denotes sup{|u(t)|, t ≥ 0}. ⊗ is used to indicate the Kronecker
product, and vec(A) is defined to be vec(A) =[a
T
1
a
T
2
...a
T
m
],where
a
i
∈ R
n
are the columns of A ∈ R
n×m
. I
n
stands for the n×n identity
matrix.
II. P
ROBLEM DESCRIPTIONS
Consider a continuous-time linear time-invariant system
interconnected with a nonlinear exosystem, described by
˙x = Ax + B(u
1
+
1
(ω, y)) + D(u
2
+
2
(ω, y)) (1)
˙ω = g(ω, y), y = Cx (2)
where x ∈ R
n
is the measured component of the state, u
1
∈ R
m
and
u
2
∈ R
p
are the two control inputs, y ∈ R
q
is the system output,
A ∈ R
n×n
, B ∈ R
n×m
,andD ∈ R
n×q
are the unknown constant
matrices with (A and B) stabilizable and (A and C) observable,
ω ∈ R
r
is the unmeasurable part of the state with unknown order r,
1
(ω, y) = E(ω, y) and
2
(ω, y) = F(ω, y) are the two mea-
surable outputs of the dynamic uncertainties with E and F ∈ R
m×s
being two unknown constant matrices, the unknown function
(ω, y) ∈ R
s
is locally Lipschitz and satisfied with (0, 0) = 0,
and the unknown function g : R
r
× R
q
→ R
r
is locally Lipschitz
and satisfied with g(0, 0) = 0.
2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.