Neuro-Optimal Control of Unknown
Nonaffine Nonlinear Systems with
Saturating Actuators
Xiong Yang, Derong Liu, Qinglai Wei
The State Key Laboratory of Management and Control for Complex
Systems Institute of Automation, Chinese Academy of Sciences,
Beijing 100190, China
(Email:xiong.yang@ia.ac.cn;derong.liu@ia.ac.cn;qinglai.wei@ia.ac.cn)
Abstract: This paper develops an adaptive optimal control for the infinite-horizon cost of
unknown nonaffine nonlinear continuous-time systems with control constraints. A recurrent
neural network (NN) is constructed to identify the unknown system dynamics with stability
proof. Then, two feedforward NNs are used as the actor and the critic to approximate the
optimal control and the optimal value, respectively. By using this architecture, the action NN
and the critic NN are tuned simultaneously, without the requirement of the knowledge of system
dynamics. In addition, the weights of the action NN and the critic NN are guaranteed to be
uniformly ultimately bounded based on Lyapunov’s direct method. A simulation example is
provided to verify the effectiveness of the developed theoretical results.
Keywords: Actuator saturation, Adaptive control, Neural networks, Nonaffine systems,
Optimal control, Unknown nonlinear systems
1. INTRODUCTION
In real engineering, saturation, backlash, and dead zone
are the common features in various actuators. During
the past several years, control of nonlinear systems with
saturating actuators has drawn intensive attention. A few
methods have been proposed successfully to derive optimal
control laws considering the saturation phenomenon (Abu-
Khalaf and Lewis, 2005; Zhang et al., 2009; Lin and
Cheng, 2012; Modares et al., 2012). However, most of
these approaches do not consider optimal control laws
for unknown nonaffine nonlinear continuous-time (CT)
systems. In this paper, we investigate this problem based
on the framework of the Hamilton-Jacobi-Bellman (HJB)
equation which is developed in the optimal control theory.
It is well-known that the HJB equation is intractable
directly to be solved by analytical approaches due to its
inherent nonlinearity.
For the sake of dealing with the problem, adaptive dy-
namic programming (ADP) algorithms are developed by
Werbos (1977, 1992). Nevertheless, most of ADP algo-
rithms are implemented requiring a priori knowledge of
system dynamics. Therefore, reinforcement learning (RL)
methods are introduced to address the issue. RL is a class
of approaches employed in machine learning to method-
ically revise the actions of an agent based on responses
from its environment (Sutton and Barto, 1998). Compared
with traditional ADP approaches, there is no prescribed
behavior or training model proposed to RL schemes.
This work was supported in part by the National Natural Sci-
ence Foundation of China under Grants 61034002, 61233001, and
61273140.
Recently, many researchers are interested in the applica-
tion of RL methods to nonlinear optimal control problems
(Abu-Khalaf and Lewis, 2005; Lewis and Vrabie, 2009;
Lin, 2011; Vamvoudakis and Lewis, 2010; Vieu et al.,
2011). Abu-Khalaf and Lewis (2005) presented an offline
algorithm based on RL to solve the HJB equation of
optimal control of nonlinear CT systems with saturating
actuators. By using the algorithm, the actor and the critic
were sequently tuned and the solution of the HJB equation
was successively approximated. After that, Vamvoudakis
and Lewis (2010) investigated the CT nonlinear optimal
control problem based on an online algorithm, which in-
volved synchronous adaption of the critic and the actor.
However, the priori knowledge of nonlinear CT systems
is still required in both Abu-Khalaf and Lewis (2005)
and Vamvoudakis and Lewis (2010). Later, Bhasin et al.
(2013) presented a projection algorithm to derive the op-
timal control of uncertain affine nonlinear CT systems.
By employing the algorithm, the actor, the critic, and the
identifier were all simultaneously tuned. However, the use
of the projection algorithm demands the selection of a
predefined convex set so as to make the target NN weights
remain in the set which is a dilemma.
Themainobjectiveofthispaperistodevelopanadaptive
optimal control for the infinite-horizon cost of unknown
nonaffine nonlinear CT systems with control constraints.
A recurrent NN (RNN) is constructed to identify the
unknown system dynamics with stability proof. Then, two
feedforward NNs are used as the actor and the critic to
approximate the optimal control and the optimal value,
respectively. By using this architecture, the action NN
and the critic NN are tuned simultaneously, without the
requirement of the knowledge of system dynamics. In
September 2-4, 2013. Chengdu, China