数据驱动的k-近邻平均器在线ADP：指数收敛与稳定性策略

116 浏览量更新于2024-08-27 收藏 339KB PDF 举报

本文献探讨了在复杂系统优化与控制问题日益突出的背景下，特别是在海洋科学、航空航天、能源、化工、生物医学和管理科学等领域，如何应用先进的技术手段解决动态规划中的维度灾难问题。作者提出了一个数据驱动的在线近邻平均器（k-nearest-neighbor averaging）辅助的指数收敛式动态规划（Approximate Dynamic Programming, ADP）方法。动态规划通常用于解决多阶段决策问题，但当问题的维度很高时，其计算成本会迅速增加，导致所谓的“维度灾难”。为了解决这个问题，近来出现了基于近似优化的解决方案，如ADP，它通过简化决策过程来提高效率。ADP的核心在于利用经验学习，逐步逼近最优策略。文章的主要贡献集中在构建一个在线的多输入多输出（Multi-Input Multi-Output, MIMO）的ADP框架，其中融合了在线学习和数据驱动的特性。具体而言，该方法有三个关键组成部分： 1. **非线性多输入多输出critic函数**：这个函数负责评估当前状态下的策略性能，它依赖于系统的非线性特性，通过近邻平均器对历史数据进行分析，提供对当前决策的实时反馈。 2. **指数收敛性**：这种方法旨在确保学习过程的快速收敛，即在有限的时间内能够接近最优解。通过精心设计的学习算法，可以实现这种指数级的进步，使得在处理高维问题时能保持较高的效率。 3. **稳定性和持久激励**：为了保证算法的稳定性和鲁棒性，论文提出了一种稳定性的概念，并结合持久激励条件，确保了在系统运行过程中，模型能够持续地获取到足够信息来改进决策策略。这项研究将机器学习、近邻算法和动态规划技术结合起来，为复杂系统的设计与控制提供了一种高效且适应性强的解决方案。通过实证分析和理论证明，这种方法有望在实际应用中展现出强大的优势，尤其是在那些优化任务需求迫切且面临高维问题的场景中。

The 2017 4th International Conference on Systems and Informatics (ICSAI 2017)

A Data-driven Online ADP of Exponential

Convergence Based on k-nearest-neighbor Averager,

Stable Term and Persistence Excitation

Zhijian Huang, Shengtang Wang

, Huan Zheng, Cheng Zhang, Guichen Zhang, Qili Wu, Qinmin Tan, Zhiyuan Yang

Lab of Intelligent Control and Computation

Shanghai Maritime University

Shanghai, China

Abstract—With the development of marine science, aeronautics

and astronautics, energy, chemical industry, biomedicine and

management science, many complex systems face the problem of

optimization and control. Approximate dynamic programming

solves the curse of dimensionality of dynamic programming, and it

is a new kind of approximate optimization solution that emerges in

recent years. Based on the analysis of optimization system, this

paper proposes a nonlinear multi-input multi-output, online

learning, and data-driven approximate dynamic programming

structure and its learning algorithm. The method is achieved from

the following three aspects: 1) the critic function of

multi-dimensional input critic module of the approximate dynamic

programming is approximated with a data-driven k-nearest

neighbor method; 2) the multi-output policy iteration of the

approximate dynamic programming actor module is calculated

with an exponential convergence performance; 3) The critic and

actor modules are learned synchronously, and achieve the online

optimal and control effect. The optimal control for the

longitudinal motion of an underwater thermal glider is used to

show the effect of the proposed method. This work can lay a

foundation for the theory and application of a nonlinear

data-driven multi-input multi-output approximate dynamic

programming method. It's also the consensus needs in

optimization control and artificial intelligence of many scientific

and engineering fields, such as energy conservation, emission

reduction, decision support and operational management etc.

Keywords-approximate dynamic programming; exponential

convergence; k-nearest-neighbor; persistence excitation

I. I

NTRODUCTION

With the development of marine science, aeronautics and

astronautics, energy, chemical industry, biomedicine and

management science, many complex systems face the problem

of optimization and control. Approximate dynamic

programming (ADP) solves the curse of dimensionality of

dynamic programming, and it is a new kind of approximate

optimization solution that emerges in recent years [1].

The neural network (NN) approximation is now a popular

method, so the ADP is also known as neuro-dynamic

programming (NDP). The benefits using the NN over other

methods are in: nonlinear adaptability, generalization ability,

automatic identification and classification, as well as learning

ability. The NN is a natural nonlinear system. Thus, the NN

based ADP has a flexible structure, which is able to meet the

control requirements of a variety of practical systems,

simultaneously, simplifying design. However, the NN based

ADP also has some inherent shortcomings. Then, there comes

many other methods to achieve ADP, such as the value iteration,

policy iteration, linear programming and Lee’s k-nearest

average algorithm [2]. They all have their own advantages and

disadvantage.

A closer examination of current literature suggests that the

ADP has been successfully applied to many control areas [3-15].

Enns and Si systematically applied the ADP method to a

complex continuous state nonlinear multi-input multi-output

(MIMO) system with uncertainty in 2003. They adopted a

cascaded NN scheme [4]. This scheme takes advantage of the

functional relationship of multi-input variables. Lee and

coworkers controlled an MIMO methyl methacrylate

polymerization reactor with ADP. They used the k-nearest

neighbor (kNN) average algorithm to improve the control

performance [2]. Padhi and coworkers solved a

multi-critic-output control problem, such as a real-life

micro-electro mechanical system. Padhi got around the

numerical problem in training with sub-network structure [16].

Some other scholars also adopted ADP method in a nonlinear

MIMO control system. For example, see the work of Liu [5],

Murray [11] and Lin [17]. However, only a few researchers

focus on the data-driven ADP method of nonlinear MIMO

system. A real data-driven online ADP method is still difficult.

Thus, it’s necessary to research this problem.

The stability and convergence of ADP are also very critical.

In 2007, Al-Tamimi et al. developed several ADP methods for

zero-sum game, and their convergence was proved by showing

that these methods were equivalent to the iterative solution of an

underlying algebraic Riccati equation, which is known to be

convergent [18]. In 2014, Heydari et al. presented a tracking

control ADP method for thre fixed-ﬁnal time of an input-afﬁne

nonlinear system. The convergence was proved through

analyzing that the ﬁxed point iteration is a contraction mapping

[19]. In 2015, Song et al. proposed a nearly ﬁnite-horizon

optimal ADP control for a class of nonafﬁne time-delay

nonlinear system. The stability and convergence are proved

through analyzing that the performance index function is a

monotonically non-increasing sequence [20]. Also in 2015,

Heydari et al. introduced an optimal ADP which can switch

from the controlled subsystems to the free mode sequence, and

the convergence is provided by proving that the iteration

This work was supported by the NSFC Projects of China under Grant

o.61403250, No.51509151, No.51779136, the bureau project of China unde

grant No.2015HT056, and the Science Commission of Shanghai under grant

o.13510501600.

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38500709

粉丝: 6
资源: 894

数据驱动的k-近邻平均器在线ADP：指数收敛与稳定性策略

MySQL™ and JSP™ Web Applications: Data-Driven Programming Using Tomcat and MySQL

Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems

Data-Driven Synthesis Of Cartoon Faces Using Different Styles

Data-Driven Crowd Design and Optimization for Product Innovation

DATA-DRIVEN LEARNING OF NON-AUTONOMOUS SYSTEMS

data-driven approaches包含哪些方法

Knowledge-driven Egocentric Multimodal Activity Recognition

multi-driven net on pin

数据驱动的协同供销场景建模

最新资源