未知非仿射非线性系统神经最优控制研究

91 浏览量更新于2024-08-26 收藏 289KB PDF 举报

"这篇研究论文探讨了针对具有饱和执行器的未知非仿射非线性系统的神经最优控制策略。通过采用递归神经网络（NN）识别未知系统动态，并设计了两个前馈神经网络分别作为演员（actor）和评论员（critic），用于逼近最优控制和最优值。该架构允许在无需系统动力学知识的情况下同时调整行动NN和评论NN的权重，且基于李雅普诺夫直接方法保证了这两个网络的权重会最终保持有界。文中提供了一个模拟示例以验证理论结果的有效性。关键词包括神经最优控制、非仿射非线性系统、饱和执行器、递归神经网络和控制约束。" 本文深入研究了在实际工程应用中常见的一个问题：如何对具有饱和特性的执行器进行有效控制，尤其是在系统动态未知且非线性的情况下。饱和执行器是指其输出存在最大和最小限制的执行机构，这在许多物理系统中是普遍存在的，如电机、液压系统等。对于这样的系统，传统的控制策略可能无法达到最优性能，因为它们通常假设执行器可以无限增益或减小其输出。论文提出了一种适应性最优控制方法，该方法针对无限时间域内的成本进行了优化。关键创新在于利用神经网络技术来处理非线性和不确定性。首先，建立一个递归神经网络来在线识别系统的动态模型，这一过程无需预先知道系统的精确数学模型。递归神经网络因其在处理动态系统中的能力而被广泛应用，能够处理随时间变化的输入和状态。随后，引入两个前馈神经网络，一个作为“演员”网络，负责生成控制输入，另一个作为“评论员”网络，估计系统的最优值函数。这种actor-critic架构是一种强化学习方法，通过不断的试错学习过程，两个网络可以协同优化控制策略。在这种情况下，由于无需了解系统动态，控制策略的适应性和鲁棒性得到了提升。为了确保系统稳定性和控制性能，论文运用了李雅普诺夫直接方法来分析系统的稳定性。通过构造合适的李雅普诺夫函数，证明了行动网络和评论网络的权重将最终保持有界，这意味着控制策略能够在满足执行器饱和约束的同时保证系统的稳定性。最后，通过一个模拟示例，论文展示了所提出的控制策略在实际问题中的应用和有效性。模拟结果证实了在未知非仿射非线性系统中，神经最优控制策略能有效应对执行器饱和问题，实现接近最优的控制性能。这篇研究论文为解决具有饱和执行器的复杂非线性系统的控制问题提供了一个新颖且实用的方法，特别是在系统动力学未知的情况下，它展示了神经网络在控制理论中的强大潜力和适用性。这项工作对未来的控制系统设计和优化具有重要的参考价值。

Neuro-Optimal Control of Unknown

Nonaﬃne Nonlinear Systems with

Saturating Actuators



Xiong Yang, Derong Liu, Qinglai Wei

The State Key Laboratory of Management and Control for Complex

Systems Institute of Automation, Chinese Academy of Sciences,

Beijing 100190, China

(Email:xiong.yang@ia.ac.cn;derong.liu@ia.ac.cn;qinglai.wei@ia.ac.cn)

Abstract: This paper develops an adaptive optimal control for the inﬁnite-horizon cost of

unknown nonaﬃne nonlinear continuous-time systems with control constraints. A recurrent

neural network (NN) is constructed to identify the unknown system dynamics with stability

proof. Then, two feedforward NNs are used as the actor and the critic to approximate the

optimal control and the optimal value, respectively. By using this architecture, the action NN

and the critic NN are tuned simultaneously, without the requirement of the knowledge of system

dynamics. In addition, the weights of the action NN and the critic NN are guaranteed to be

uniformly ultimately bounded based on Lyapunov’s direct method. A simulation example is

provided to verify the eﬀectiveness of the developed theoretical results.

Keywords: Actuator saturation, Adaptive control, Neural networks, Nonaﬃne systems,

Optimal control, Unknown nonlinear systems

1. INTRODUCTION

In real engineering, saturation, backlash, and dead zone

are the common features in various actuators. During

the past several years, control of nonlinear systems with

saturating actuators has drawn intensive attention. A few

methods have been proposed successfully to derive optimal

control laws considering the saturation phenomenon (Abu-

Khalaf and Lewis, 2005; Zhang et al., 2009; Lin and

Cheng, 2012; Modares et al., 2012). However, most of

these approaches do not consider optimal control laws

for unknown nonaﬃne nonlinear continuous-time (CT)

systems. In this paper, we investigate this problem based

on the framework of the Hamilton-Jacobi-Bellman (HJB)

equation which is developed in the optimal control theory.

It is well-known that the HJB equation is intractable

directly to be solved by analytical approaches due to its

inherent nonlinearity.

For the sake of dealing with the problem, adaptive dy-

namic programming (ADP) algorithms are developed by

Werbos (1977, 1992). Nevertheless, most of ADP algo-

rithms are implemented requiring a priori knowledge of

system dynamics. Therefore, reinforcement learning (RL)

methods are introduced to address the issue. RL is a class

of approaches employed in machine learning to method-

ically revise the actions of an agent based on responses

from its environment (Sutton and Barto, 1998). Compared

with traditional ADP approaches, there is no prescribed

behavior or training model proposed to RL schemes.



This work was supported in part by the National Natural Sci-

ence Foundation of China under Grants 61034002, 61233001, and

61273140.

Recently, many researchers are interested in the applica-

tion of RL methods to nonlinear optimal control problems

(Abu-Khalaf and Lewis, 2005; Lewis and Vrabie, 2009;

Lin, 2011; Vamvoudakis and Lewis, 2010; Vieu et al.,

2011). Abu-Khalaf and Lewis (2005) presented an oﬄine

algorithm based on RL to solve the HJB equation of

optimal control of nonlinear CT systems with saturating

actuators. By using the algorithm, the actor and the critic

were sequently tuned and the solution of the HJB equation

was successively approximated. After that, Vamvoudakis

and Lewis (2010) investigated the CT nonlinear optimal

control problem based on an online algorithm, which in-

volved synchronous adaption of the critic and the actor.

However, the priori knowledge of nonlinear CT systems

is still required in both Abu-Khalaf and Lewis (2005)

and Vamvoudakis and Lewis (2010). Later, Bhasin et al.

(2013) presented a projection algorithm to derive the op-

timal control of uncertain aﬃne nonlinear CT systems.

By employing the algorithm, the actor, the critic, and the

identiﬁer were all simultaneously tuned. However, the use

of the projection algorithm demands the selection of a

predeﬁned convex set so as to make the target NN weights

remain in the set which is a dilemma.

Themainobjectiveofthispaperistodevelopanadaptive

optimal control for the inﬁnite-horizon cost of unknown

nonaﬃne nonlinear CT systems with control constraints.

A recurrent NN (RNN) is constructed to identify the

unknown system dynamics with stability proof. Then, two

feedforward NNs are used as the actor and the critic to

approximate the optimal control and the optimal value,

respectively. By using this architecture, the action NN

and the critic NN are tuned simultaneously, without the

requirement of the knowledge of system dynamics. In

3rd IFAC International Conference on Intelligent Control

and Automation Science.

September 2-4, 2013. Chengdu, China

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38747815

粉丝: 54
资源: 889

未知非仿射非线性系统神经最优控制研究

Spring MVC架构详解与配置指南：实现Web应用的高效开发

基于golang的渗透测试武器，将web打点部分与常规的漏扫部分进行整合与改进.zip

渗透测试与搭建.zip

【java毕业设计】野生动物公益保护系统源码（ssm+mysql+说明文档+LW）.zip

【java毕业设计】易商B2C网上交易系统ssh+mysql源码（完整前后端+说明文档+LW）.zip

网站渗透测试系统.zip

主要用于渗透测试中的字典.zip

精选微信小程序源码：点外卖小程序（含源码+源码导入视频教程&文档教程，亲测可用）

【java毕业设计】医院药品查询系统源码（完整前后端+说明文档+LW）.zip

java基于SSM智能养生平台系统源码数据库 MySQL源码类型 WebForm

最新资源