马氏跳多智能体系统中团队决策的均场控制方法

PDF格式 | 400KB | 更新于2024-08-26 | 153 浏览量 | 举报

马氏跳多智能体系统的平均战队决策问题是当前系统与控制领域的一个热门研究课题，特别是在经济学、金融、通信工程、生物学以及医学等众多领域具有广泛的应用前景。这篇论文由山东大学控制科学与工程学院的Bing-Chang Wang教授撰写，他探讨了带有马尔可夫跳跃参数和耦合索引的多智能体团队决策问题。在论文的开篇，作者回顾了近年来mean-field game（平均场博弈）和control（控制理论）领域的快速发展，这些理论模型在处理大规模系统的协同行为时展现出强大的分析工具。mean-field模型能够有效地捕捉个体行为对整体系统的影响，使得团队决策问题的研究更具挑战性也更具现实意义。论文的核心内容集中在对多智能体系统中马尔可夫跳跃参数的团队决策问题进行集中策略分析。通过深入研究，作者推导出一个关于团队策略的参数化方程。进一步，作者通过在扩展状态空间中解决一个优化控制问题，获得了所谓的相容性方程，从而设计出一组分布式策略。这些策略不仅考虑了每个智能体的个体行为，还兼顾了系统的全局优化。为了确保团队决策系统的稳定性，作者构建了一个Lyapunov函数，这是一种常用的稳定性分析工具。通过Lyapunov函数的构造和分析，作者证明了闭环系统是均匀稳定的，意味着分布式策略可以确保整个团队的长期稳定性和有效性。此外，这些策略也被证明是团队最优的，即它们能够在满足系统性能的同时最大化团队的整体利益。论文的关键词包括“团队决策问题”、“平均场控制”、“分布式策略”以及“马尔可夫跳跃系统”，这些词汇突出了研究的核心内容和方法论。这篇论文在多智能体系统理论与应用之间架起了一座桥梁，对于理解复杂环境下的分布式决策制定具有重要的学术价值和实际指导意义。

Mean Field Team Decision Problems for Makov Jump

Multiagent Systems

Bing-Chang Wang

School of Control Science and Engineering, Shandong University, Jinan, 250061, P. R. China

E-mail: bcwang@sdu.edu.cn

Abstract: This paper studies the mean ﬁeld team decision problem for multiagent systems with Markov jump parameters and

coupled indices. By analyzing the centralized strategy of the team problem, we get a parameterized equation. Then by solving

an optimal control problem in the augmented state space, we obtain the consistency equation, from which a set of distributed

strategies is designed. By constructing the Lypunov function, we show that the closed-loop system is uniformly stable, and the

set of distributed strategies is team-optimal.

Key Words: Team decision problem, mean ﬁeld control, distributed strategy, Markov jump system

1 Introduction

In recent years, the study of mean ﬁeld games and con-

trol has become a hot topic in the community of systems and

control [1]. Mean ﬁeld models have wide application back-

grounds in many areas including economics, ﬁnance, com-

munication engineering, biology and medicine [2–4]. Such

models have been investigated by researchers in diverse ar-

eas from a variety of perspectives [5–14]. In mean ﬁeld

models, each agent is affected by the average interaction of

all the other agents, while the individual inﬂuence of each

agent is negligible. From the relationship between popula-

tion macroscopic behavior and individual behavior, one can

get that the population aggregate effect satisﬁes a ﬁxed-point

equation. Then by solving the ﬁxed-point equation and the

single-agent optimal control problem, decentralized asymp-

totical Nash equilibria are obtained [6, 7, 13].

In practical ﬁnancial markets, ecological systems, and so-

cial systems, surrounding environment is constantly chang-

ing. For instance, the change rates of prices in a ﬁnan-

cial market in different time slots may be very different.

A powerful tool depicting abrupt environmental changes

is the Markov jump model [15, 16]. Wang and Zhang

[13, 17, 18] investigated mean ﬁeld game and control prob-

lems for Markov jump multigagent systems, and gave dis-

tributed asymptotical Nash equilibrium strategies.

With wide applications, team decision problems have a

long history [19–21]. In team decision problems, all the

agents have a common objective function, which is regarded

as the social index. Agents have different measurements or

information structures [21]. The team-optimal strategy is

globally optimal, hence it is not only person-by-person op-

timal, but also Pareto optimal. Under some convexity con-

ditions, the person-by-person optimal strategy is also team-

optimal [22]. Huang et al. [23] investigated social optima

for mean ﬁeld LQG control models, and gave centralized

and decentralized team-optimal solutions.

This paper considers the team decision problem of mean

ﬁeld models with Markov jump parameters and coupled in-

dices. Different from previous work [13, 23], the dynam-

This work is supported by National Natural Science Foundation of

China under Grant 61403233, and the Fundamental Research Funds of

Shandong University under Grant 2014TB007.

ics of all the agents are driven by the same continuous-time

Markov chain. Due to the impact of random parameters, the

population aggregate effect is a stochastic process depend-

ing on Markov jump parameters, instead of a deterministic

function. Thus, the population aggregate effect is no more

obtained by dealing with a ﬁxed-point equation as in pre-

vious work [23]. We achieve the control synthesis by the

parametric approach and the state space augmentation. By

analyzing the centralized strategy of the team problem, we

get a parameterized equation. Then by solving an optimal

control problem in the augmented state space, we obtained

the consistency equations, from which a set of distributed

strategies is designed. By constructing the Lypunov function

and using the probability limitation theory, we show that the

closed-loop system is uniformly stable, and the set of dis-

tributed strategies is team-optimal.

The following notations will be used in the paper. ∥⋅∥

denotes the Euclidean vector norm or matrix norm induced

by Euclidean vector norm; 𝐼

𝑛

denotes an 𝑛-dimensional

identity matrix. For any vector 𝑥 with proper dimensions

and symmetric matrix 𝑄 ≥ 0, ∥𝑥∥

𝑄

=(𝑥

𝑇

𝑄𝑥)

1/2

𝐶

𝑏

([0, ∞), ℝ

𝑛

) denotes the class of 𝑛-dimensional bounded

continuous functions in [0, ∞).

2 Problem Formulation

Consider the multiagent system evolving by the following

dynamics:

𝑑𝑥

𝑖

(𝑡)=𝐴

𝜃(𝑡)

𝑥

𝑖

(𝑡)𝑑𝑡 + 𝐵

𝜃(𝑡)

𝑢

𝑖

(𝑡)𝑑𝑡 + ℎ(𝑡)𝑑𝑡

+𝐷

𝜃(𝑡)

𝑑𝑊

𝑖

(𝑡), 1 ≤ 𝑖 ≤ 𝑁, (1)

where 𝑥

𝑖

∈ ℝ

𝑛

and 𝑢

𝑖

∈ ℝ

𝑟

are the state and input of the 𝑖th

agent, and {𝑊

𝑖

(𝑡), 1 ≤ 𝑖 ≤ 𝑁} is a family of independent

𝑑-dimensional Brownian motions. ℎ ∈ 𝐶

𝑏

([0, ∞), ℝ

𝑛

) is

an external signal, reﬂecting the impact on the 𝑖th agent by

the environment. {𝜃(𝑡)} is a continuous-time Markov chain

taking value in 𝑆 = {1, 2,...,𝑚} with the transition rate

matrix (inﬁnitesimal generator) Λ={𝜆

𝑖𝑗

,𝑖,𝑗 =1,...,𝑚}.

The index of the 𝑖th agent is

𝐽

𝑖

(𝑢) = lim sup

𝑇 →∞

𝑇

𝐸



𝑇





𝑥

𝑖

(𝑡) − Φ[𝑥

(𝑁)

(𝑡)]



𝑄

𝜃(𝑡)

+∥𝑢

𝑖

(𝑡)∥

𝑅

𝜃(𝑡)



𝑑𝑡, (2)

Proceedings of the 34th Chinese Control Conference

Jul

28-30, 2015, Han

zhou, China

1845

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38745925

粉丝: 28

马氏跳多智能体系统中团队决策的均场控制方法

乘性噪声与延迟模式观测下的离散马氏跳线性系统状态次优估计

广义马氏跳变系统随机非线性扰动下的控制器设计优化

网络半马氏跳系统事件触发滤波研究与质量弹簧模型应用

随机非线性扰动下广义马氏跳变系统的控制器设计

网络半马氏跳系统的事件触发耗散滤波及其在质量弹簧系统模型中的应用

C语言实现运筹学中的马氏决策算法实例

求解部分可观测马氏决策过程的强化学习算法

基于离散时间马氏决策过程的概率布尔网络模型

马氏跳变参数不确定性广义时滞系统稳定性分析

广义马氏跳变系统随机非线性扰动下的控制器设计：LMI方法与有效性验证

最新资源