Lagrange equations for the DIPC system can be writ-
ten in a more compact matrix form:
D(θ)
¨
θ + C(θ,
˙
θ)
˙
θ + G(θ) = Hu (2)
where
D(θ)=
d
1
d
2
cos θ
1
d
3
cos θ
2
d
2
cos θ
1
d
4
d
5
cos(θ
1
−θ
2
)
d
3
cos θ
2
d
5
cos(θ
1
−θ
2
) d
6
!
(3)
C(θ,
˙
θ)=
0 −d
2
sin(θ
1
)
˙
θ
1
−d
3
sin(θ
2
)
˙
θ
2
0 0 d
5
sin(θ
1
−θ
2
)
˙
θ
2
0 −d
5
sin(θ
1
−θ
2
)
˙
θ
1
0
(4)
G(θ) =
0
−f
1
sin θ
1
−f
2
sin θ
2
!
(5)
H = (1 0 0)
T
Assuming that centers of mass of the pendulums are in
the geometrical center of the links, which are solid rods,
we have: l
i
= L
i
/2, I
i
= m
i
L
2
i
/12. Then for the ele-
ments of matrices D(θ), C(θ,
˙
θ), and G(θ) we get:
d
1
= m
0
+ m
1
+ m
2
d
2
= m
1
l
1
+ m
2
L1 =
1
2
m
1
+ m
2
L
1
d
3
= m
2
l
2
=
1
2
m
2
L
2
d
4
= m
1
l
2
1
+ m
2
L
2
1
+ I
1
=
1
3
m
1
+ m
2
L
2
1
d
5
= m
2
L
1
l
2
=
1
2
m
2
L
1
L
2
d
6
= m
2
l
2
2
+ I
2
=
1
3
m
2
L
2
2
f
1
= (m
1
l
1
+ m
2
L
1
)g = (
1
2
m
1
+ m
2
)L
1
g
f
2
= m
2
l
2
g =
1
2
m
2
L
2
g
Note that matrix D(θ) is symmetric and nonsingular.
4 Control
To design a control law, Lagrange equations of motion
(2) are reformulated into a 6-th order system of ordinary
differential equations. To do this, a state vector x ∈ R
6
is introduced:
x = (θ
˙
θ)
T
Then dropping dependencies of the system matrices on
the generalized coordinates and their derivatives, the
system dynamic equations appear as:
˙x =
0 I
0 −D
−1
C
x+
0
−D
−1
G
+
0
D
−1
H
u (6)
In this report, optimal nonlinear stabilization control de-
sign is addressed: stabilize the DIPC minimizing an ac-
cumulative cost functional quadratic in states and con-
trols. The general problem of designing an optimal con-
trol law involves minimizing a cost function
J
t
=
t
f inal
X
k=t
L
k
(x
k
, u
k
), (7)
which represents an accumulated cost of the sequence of
states x
k
and controls u
k
from the current discrete time t
to the final time t
final
. For regulation problems t
final
=
∞. Optimization is done with respect to the control
sequence subject to constraints of the system dynamics
(6). In our case,
L
k
(x
k
, u
k
) = x
T
k
Qx
k
+ u
T
k
Ru
k
(8)
corresponds to the standard Linear Quadratic cost. For
linear systems, this leads to linear state-feedback control,
LQR, designed in the next subsection. For nonlinear
systems the optimal control problem generally requires
a numerical solution, which can be computationally pro-
hibitive. An analytical approximation to the nonlin-
ear optimal control solution is utilized in subsection on
SDRE control, which represents a nonlinear extension
to the LQR and yields superior results. Neural net-
work (NN) capabilities for function approximation are
employed to approximate the nonlinear control solution
in subsection on NN control. And combinations of the
NN with LQR and SDRE are investigated in the subsec-
tion following the NN control.
4.1 Linear Quadratic Regulator
The linear quadratic regulator yields an optimal solu-
tion to the control problem (7)–(8) when system dynam-
ics are linear. Since DIPC is nonlinear, as described by
(6), it can be linearized to derive an approximate linear
solution to the optimal control problem. Linearization
of (6) around x = 0 yields:
˙x = Ax + Bu (9)
where
A =
0 I
−D(0)
−1
∂G(0)
∂θ
0
(10)
B =
0
D(0)
−1
H
(11)
and the continuous LQR solution is obtained then by:
u = −R
−1
B
T
P
c
x ≡ −K
c
x (12)
where P
c
is a steady-state solution of the differential
Riccati equation. To implement computerized digital
control, dynamic equations (9) are approximately dis-
cretized as Φ ≈ e
A∆t
, Γ ≈ B∆t, and digital LQR con-
trol is then given by
u
k
= −R
−1
Γ
T
Px
k
≡ −Kx
k
(13)
where P is the steady state solution of the difference
Riccati equation, obtained by solving the discrete-time
algebraic Riccati equation
Φ
T
[P − PΓ(R + Γ
T
PΓ)
−1
Γ
T
P]Φ − P + Q = 0 (14)
where Q ∈ R
6×6
and R ∈ R are positive definite state
and control cost matrices. Since linearization (9)–(11)
accurately represents the DIPC system (6) in the equi-
librium, the LQR control (12) or (13) will be a locally
near-optimal stabilizing control.
3