线性脉冲系统优化：自适应动态规划方法

33 浏览量更新于2024-08-29 收藏 665KB PDF 举报

"这篇研究论文探讨了线性脉冲系统中基于强化学习的自适应动态规划(ADP)方法在优化问题中的应用。针对线性脉冲系统，论文展示了最优目标函数与预脉冲状态的二次形式关系。ADP方法提供了一种迭代收敛到最优目标函数的解决方案，并且给出了一种初始预脉冲目标函数的猜测方法。" 线性脉冲系统是一种特殊类型的动态系统，它在时间上呈现出离散的、不连续的行为，这种行为通常由一系列脉冲或冲击输入引起。在控制理论中，这些系统的研究对于理解和设计各种工程和物理系统，如机械、电气和自动化系统，具有重要意义。自适应动态规划(ADP)是一种基于强化学习的控制策略，它通过不断学习和优化系统性能来寻找最优控制策略。在本文中，ADP被应用于线性脉冲系统的优化，其目的是找到一个能够最小化或最大化某些性能指标的控制策略。论文指出，对于线性脉冲系统，最优目标函数可以表示为预脉冲状态的二次函数，这意味着系统的最优行为可以通过分析这些状态变量的二次关系来确定。 ADP方法的核心是迭代过程，它通过不断更新价值函数和控制策略来逐步接近最优解。论文中可能详细描述了这个过程，包括如何构建价值函数、如何更新控制策略以及如何确保收敛性。此外，论文还提到了初始预脉冲目标函数的猜测，这可能是为了启动ADP算法的一个关键步骤，因为它为算法提供了一个起点，然后通过学习和迭代改进。通过使用ADP，研究者可以解决线性脉冲系统中的复杂优化问题，而无需对系统进行完全建模或知道精确的动态特性。这种方法的实用性在于它的自适应性，能够在线调整控制策略以应对环境变化或不确定性。论文的接收和修订日期表明，该研究是在2013年进行的，这可能是ADP在当时的一个前沿应用。尽管如此，随着时间的推移，ADP和强化学习技术在处理复杂动态系统优化问题方面的重要性已经得到了广泛认可，成为了现代控制理论和人工智能领域的重要组成部分。

Wang et al. / J Zhejiang Univ-Sci C (Comput & Electron) 2014 15(1):43-50 43

Journal of Zhejiang University-SCIENCE C (Computers & Electronics)

ISSN 1869-1951 (Print); ISSN 1869-196X (Online)

www.zju.edu.cn/jzus; www.springerlink.com

E-mail: jzus@zju.edu.cn

Adaptive dynamic programming for

linear impulse systems

∗

Xiao-hua WANG

†1,2

, Juan-juan YU

, Yao HUANG

, Hua WANG

1,2

, Zhong-hua MIAO

†‡1,2

(

School of Mechatronics Engineering and Automation, Shanghai University, Shanghai 200072, China)

(

Shanghai Key Laboratory of Power Station Automation Technology, Shanghai University, Shanghai 200072, China)

†

E-mail: {x.wang, zhhmiao}@shu.edu.cn

Received May 27, 2013; Revision accepted Nov. 26, 2013; Crosschecked Dec. 19, 2013

Abstract: We investigate the optimization of linear impulse systems with the reinforcement learning based

adaptive dynamic programming (ADP) method. For linear impulse systems, the optimal objective function is shown

to be a quadric form of the pre-impulse states. The ADP metho d provides solutions that iteratively converge to the

optimal objective function. If an initial guess of the pre-impulse objective function is selected as a quadratic form of

the pre-impulse states, the objective function iteratively converges to the optimal one through ADP. Though direct

use of the quadratic objective function of the states within the ADP method is theoretically possible, the numerical

singularity problem may occur due to the matrix inversion therein when the system dimensionality increases. A

neural network based ADP method can circumvent this problem. A neural network with polynomial activation

functions is selected to approximate the pre-impulse objective function and trained iteratively using the ADP

method to achieve optimal control. After a successful training, optimal impulse control can be derived. Simulations

are presented for illustrative purposes.

Key w ords: Adaptive dynamic programming (ADP), Impulse system, Optimal control, Neural network

doi:10.1631/jzus.C1300145 Document code: A CLC number: TP273.1

1 Introduction

Impulse system control has attracted much

attention recently (Lakshmikantham et al., 1989;

Bainov and Simeonov, 1995). An impulsive diﬀer-

ential equation (Lakshmikantham et al., 1989) pro-

vides a fundamental tool for impulse system model-

ing and control. When the time of impulse is ﬁxed,

the impulse system is known as a ﬁxed time impulse

system; when the impulse time is a function of sys-

tem states, it is a variable impulse system problem

(Wang, 2008). An interesting and ubiquitous ex-

‡

Corresponding author

Project supported by the National Natural Science Founda-

tion of China (Nos. 61104006, 51175319, and 11202121), the

MOE Scientiﬁc Research Foundation for the Returned Overseas

Chinese Scholars, the Natural Science Foundation of Shanghai

(No. 11ZR1412400), and the Shanghai Education Commission

(Nos. 12YZ010, 12JC1404100, and 11CH-05), China



Zhejiang University and Springer-Verlag Berlin Heidelberg 2014

ample of an impulse system is that of human beings

(plant) taking medicine such as tablets (impulse con-

trol). Yang (1999) gave a few other good examples.

Optimal control of impulse systems has been

studied recently. The existence of optimal con-

trol has been investigated (Ahmed, 2003; Wang and

Yang, 2010). Necessary conditions for optimality

have been proposed for diﬀerent classes of impulse

systems (Silva and Vinter, 1997; Liu et al., 2008).

Methods of dynamic programming (Kurzhanski and

Daryin, 2008) and maximum principle (Wu and

Zhang, 2011; Fraga and Pereira, 2012) have been

studied in the literature. However, to numerically

solve for the optimal impulse control is still a major

challenge.

Adaptive dynamic programming (ADP) is a re-

inforcement learning based method. It was ﬁrst pro-

posed by Werbos (1974) to solve for the optimal

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38681736

粉丝: 3
资源: 886

线性脉冲系统优化：自适应动态规划方法

揭示非线性连续系统新型自适应脉冲观测器设计的关键缺陷

超混沌系统同步：脉冲自适应控制器设计与分析

优化非线性滤波器：自适应权重众多滤波算法在α稳定噪声下的信号处理

基于非线性变换的自适应时间延迟估计 (2014年)

非高斯脉冲噪声下的非线性自适应最小p范数迭代方法

一类具有采样输出的非线性系统的连续离散自适应观测器

Buck型直流变换器的自适应动态面控制研究 (2007年)

一类非线性混沌系统稳定与同步的降阶自适应控制设计

系统辨识与自适应控制程序chap2.rar_above8jy_系统辨识控制_自适应_自适应控制_自适应系统

自适应SPWM逆变器仿真与非线性控制系统的程序实现

最新资源