978-1-5386-1107-4/17/$31.00 ©2017 IEEE 1
The 2017 4th International Conference on Systems and Informatics (ICSAI 2017)
A Data-driven Online ADP of Exponential
Convergence Based on k-nearest-neighbor Averager,
Stable Term and Persistence Excitation
Zhijian Huang, Shengtang Wang
, Huan Zheng, Cheng Zhang, Guichen Zhang, Qili Wu, Qinmin Tan, Zhiyuan Yang
Lab of Intelligent Control and Computation
Shanghai Maritime University
Shanghai, China
Abstract—With the development of marine science, aeronautics
and astronautics, energy, chemical industry, biomedicine and
management science, many complex systems face the problem of
optimization and control. Approximate dynamic programming
solves the curse of dimensionality of dynamic programming, and it
is a new kind of approximate optimization solution that emerges in
recent years. Based on the analysis of optimization system, this
paper proposes a nonlinear multi-input multi-output, online
learning, and data-driven approximate dynamic programming
structure and its learning algorithm. The method is achieved from
the following three aspects: 1) the critic function of
multi-dimensional input critic module of the approximate dynamic
programming is approximated with a data-driven k-nearest
neighbor method; 2) the multi-output policy iteration of the
approximate dynamic programming actor module is calculated
with an exponential convergence performance; 3) The critic and
actor modules are learned synchronously, and achieve the online
optimal and control effect. The optimal control for the
longitudinal motion of an underwater thermal glider is used to
show the effect of the proposed method. This work can lay a
foundation for the theory and application of a nonlinear
data-driven multi-input multi-output approximate dynamic
programming method. It's also the consensus needs in
optimization control and artificial intelligence of many scientific
and engineering fields, such as energy conservation, emission
reduction, decision support and operational management etc.
Keywords-approximate dynamic programming; exponential
convergence; k-nearest-neighbor; persistence excitation
I. I
NTRODUCTION
With the development of marine science, aeronautics and
astronautics, energy, chemical industry, biomedicine and
management science, many complex systems face the problem
of optimization and control. Approximate dynamic
programming (ADP) solves the curse of dimensionality of
dynamic programming, and it is a new kind of approximate
optimization solution that emerges in recent years [1].
The neural network (NN) approximation is now a popular
method, so the ADP is also known as neuro-dynamic
programming (NDP). The benefits using the NN over other
methods are in: nonlinear adaptability, generalization ability,
automatic identification and classification, as well as learning
ability. The NN is a natural nonlinear system. Thus, the NN
based ADP has a flexible structure, which is able to meet the
control requirements of a variety of practical systems,
simultaneously, simplifying design. However, the NN based
ADP also has some inherent shortcomings. Then, there comes
many other methods to achieve ADP, such as the value iteration,
policy iteration, linear programming and Lee’s k-nearest
average algorithm [2]. They all have their own advantages and
disadvantage.
A closer examination of current literature suggests that the
ADP has been successfully applied to many control areas [3-15].
Enns and Si systematically applied the ADP method to a
complex continuous state nonlinear multi-input multi-output
(MIMO) system with uncertainty in 2003. They adopted a
cascaded NN scheme [4]. This scheme takes advantage of the
functional relationship of multi-input variables. Lee and
coworkers controlled an MIMO methyl methacrylate
polymerization reactor with ADP. They used the k-nearest
neighbor (kNN) average algorithm to improve the control
performance [2]. Padhi and coworkers solved a
multi-critic-output control problem, such as a real-life
micro-electro mechanical system. Padhi got around the
numerical problem in training with sub-network structure [16].
Some other scholars also adopted ADP method in a nonlinear
MIMO control system. For example, see the work of Liu [5],
Murray [11] and Lin [17]. However, only a few researchers
focus on the data-driven ADP method of nonlinear MIMO
system. A real data-driven online ADP method is still difficult.
Thus, it’s necessary to research this problem.
The stability and convergence of ADP are also very critical.
In 2007, Al-Tamimi et al. developed several ADP methods for
zero-sum game, and their convergence was proved by showing
that these methods were equivalent to the iterative solution of an
underlying algebraic Riccati equation, which is known to be
convergent [18]. In 2014, Heydari et al. presented a tracking
control ADP method for thre fixed-final time of an input-affine
nonlinear system. The convergence was proved through
analyzing that the fixed point iteration is a contraction mapping
[19]. In 2015, Song et al. proposed a nearly finite-horizon
optimal ADP control for a class of nonaffine time-delay
nonlinear system. The stability and convergence are proved
through analyzing that the performance index function is a
monotonically non-increasing sequence [20]. Also in 2015,
Heydari et al. introduced an optimal ADP which can switch
from the controlled subsystems to the free mode sequence, and
the convergence is provided by proving that the iteration
This work was supported by the NSFC Projects of China under Grant