G. Che et al. / Nonlinear trajectory-tracking control for AUV base on ADP 4207
J(η) =
J
1
(η)0
3×3
0
3×3
J
2
(η)
(2)
J
1
(η) =
⎛
⎜
⎝
cos ψ cos θ
11
12
sin ψ cos θ
21
22
− sin θ cos θ sin φ cos θ cos φ
⎞
⎟
⎠
(3)
J
2
(η) =
⎛
⎜
⎝
1 tan θ sin φ tan θ sin φ
0 cos φ − sin φ
0 sin φ sec θ cos φ sec θ
⎞
⎟
⎠
(4)
where
11
= cos ψ sin θ sin φ − sin ψ cos φ;
12
= cos ψ sin θ cos φ + sin ψ sin φ;
21
= sin ψ sin θ sin φ + cos ψ cos φ;
22
=
sin ψ sin θ cos φ − cos ψ sin φ.
The dynamic model system is established via laws
of Newton:
[M]
˙
ξ + [C(ξ)]ξ + [D(ξ)]ξ + g(η) = τ (5)
where [M] ∈
6×6
is the inertia matrix and its inverse
[M]
−1
;[C(ξ)] ∈
6×6
is the Coriolis and centripetal
matrix; [D(ξ)] ∈
6×6
is the damping matrix; g(η) ∈
6
is the gravity and buoyancy forces vector; τ ∈
6
is the generalized thrust force vector.
The equation (5) can be transformed as follow:
˙
ξ = [M]
−1
([C(ξ)] − [D(ξ)]ξ − g(η) + τ) (6)
2.2. Problem formulation
We suppose that the sample time is very short.
According to equation (6), the discrete-time dynamic
system is described as follows:
ξ(k + 1) = f (ξ(k)) + ι(ξ(k))u(ξ(k)) (7)
where u(ξ(k))is the system control input. For opti-
mal tracking control problem, the control objective
is to find an optimal control u
∗
(ξ(k)), so as to make
equation (7) track the desired trajectory ξ
d
(k). For
simplicity, u(ξ(k)) is replaced by u(k).
The tracking error is defined as:
e(k) = ξ(k) − ξ
d
(8)
The control input error is described as :
ν(k) = u(k) − u
d
(k) (9)
u
d
(k) = ι
−1
(ξ
d
(k))(ξ
d
(k + 1) − f (ξ
d
(k)) − [M]
−1
g)
(10)
where u
d
(k) is the expected control input and
introduced for analytical purpose. By substituting
equations (8), (9) and (10), the new system is obtained
as follows:
e(k + 1) = f (e(k) + ξ
d
(k)) + ι(e(k) + ξ
d
(k))
ι
−1
(ξ
d
(k))(ξ
d
(k + 1) − f (ξ(k)))
−ξ
d
(k + 1) + ι(e(k) + ξ
d
(k))ν(k)
(11)
where u
d
(k) and u
d
(k + 1) are desired control vec-
tors.
The equation (11) can be represented as:
e(k + 1) = F (e(k),ν(k)) (12)
where e(k) is the sate vector and ι(k) is the control
vector.
According to Bellman’s optimality principle, the
control performance index function
J
∗
(e(k)) = min
ν(k)
∞
i=k
γ
i−k
U(e(i),ν(i)) (13)
where U(e(k),ι(k)) = e
T
(k)Qe(k) + ν
T
(k)Rν(k)is
the utility function,γ is the discount factor with 0 <
γ ≤ 1; and Q and R are symmetric and positive-
define matrices.
In other words,J
∗
(e(k)) satisfies the discrete-time
HJB equation. Therefore, the optimal control law can
be expressed as:
υ
∗
(k) = arg min
ν(k)
{U(e(k),ν(k)) + γJ
∗
(e(k + 1))}
(14)
Bellman principle yields a backwards-in-time pro-
cedure for solving the optimal control problem,
because that must know the optimal policy at time
k + 1 according to equation (14) that used to deter-
mine the optimal policy at time k. However, it
always causes that it is often computationally unten-
able to run true dynamic programming due to
the backward numerical process required for its
solutions. In order to overcome this difficulty, approx-
imate the performance index function is proposed.
sectionTrajectory-tracking control based on ADP
algorithm
2.3. Derivative of policy iterative ADP algorithm
In this section, the policy iterative ADP algorithm
is presented, and the value function and control law
are updated by iterations. First of all, we start with an
initial admissible control law
ˆ
ν
0
(k), and let
ˆ
J
0
(e(k +
1)) satisfy the HJB equation: