Wang et al. / J Zhejiang Univ-Sci C (Comput & Electron) 2014 15(1):43-50 43
Journal of Zhejiang University-SCIENCE C (Computers & Electronics)
ISSN 1869-1951 (Print); ISSN 1869-196X (Online)
www.zju.edu.cn/jzus; www.springerlink.com
E-mail: jzus@zju.edu.cn
Adaptive dynamic programming for
linear impulse systems
∗
Xiao-hua WANG
†1,2
, Juan-juan YU
1
, Yao HUANG
1
, Hua WANG
1,2
, Zhong-hua MIAO
†‡1,2
(
1
School of Mechatronics Engineering and Automation, Shanghai University, Shanghai 200072, China)
(
2
Shanghai Key Laboratory of Power Station Automation Technology, Shanghai University, Shanghai 200072, China)
†
E-mail: {x.wang, zhhmiao}@shu.edu.cn
Received May 27, 2013; Revision accepted Nov. 26, 2013; Crosschecked Dec. 19, 2013
Abstract: We investigate the optimization of linear impulse systems with the reinforcement learning based
adaptive dynamic programming (ADP) method. For linear impulse systems, the optimal objective function is shown
to be a quadric form of the pre-impulse states. The ADP metho d provides solutions that iteratively converge to the
optimal objective function. If an initial guess of the pre-impulse objective function is selected as a quadratic form of
the pre-impulse states, the objective function iteratively converges to the optimal one through ADP. Though direct
use of the quadratic objective function of the states within the ADP method is theoretically possible, the numerical
singularity problem may occur due to the matrix inversion therein when the system dimensionality increases. A
neural network based ADP method can circumvent this problem. A neural network with polynomial activation
functions is selected to approximate the pre-impulse objective function and trained iteratively using the ADP
method to achieve optimal control. After a successful training, optimal impulse control can be derived. Simulations
are presented for illustrative purposes.
Key w ords: Adaptive dynamic programming (ADP), Impulse system, Optimal control, Neural network
doi:10.1631/jzus.C1300145 Document code: A CLC number: TP273.1
1 Introduction
Impulse system control has attracted much
attention recently (Lakshmikantham et al., 1989;
Bainov and Simeonov, 1995). An impulsive differ-
ential equation (Lakshmikantham et al., 1989) pro-
vides a fundamental tool for impulse system model-
ing and control. When the time of impulse is fixed,
the impulse system is known as a fixed time impulse
system; when the impulse time is a function of sys-
tem states, it is a variable impulse system problem
(Wang, 2008). An interesting and ubiquitous ex-
‡
Corresponding author
*
Project supported by the National Natural Science Founda-
tion of China (Nos. 61104006, 51175319, and 11202121), the
MOE Scientific Research Foundation for the Returned Overseas
Chinese Scholars, the Natural Science Foundation of Shanghai
(No. 11ZR1412400), and the Shanghai Education Commission
(Nos. 12YZ010, 12JC1404100, and 11CH-05), China
c
Zhejiang University and Springer-Verlag Berlin Heidelberg 2014
ample of an impulse system is that of human beings
(plant) taking medicine such as tablets (impulse con-
trol). Yang (1999) gave a few other good examples.
Optimal control of impulse systems has been
studied recently. The existence of optimal con-
trol has been investigated (Ahmed, 2003; Wang and
Yang, 2010). Necessary conditions for optimality
have been proposed for different classes of impulse
systems (Silva and Vinter, 1997; Liu et al., 2008).
Methods of dynamic programming (Kurzhanski and
Daryin, 2008) and maximum principle (Wu and
Zhang, 2011; Fraga and Pereira, 2012) have been
studied in the literature. However, to numerically
solve for the optimal impulse control is still a major
challenge.
Adaptive dynamic programming (ADP) is a re-
inforcement learning based method. It was first pro-
posed by Werbos (1974) to solve for the optimal