深度强化学习在全规模多人在线战斗竞技游戏中的应用探索

需积分: 9 194 浏览量更新于2024-07-15 收藏 4.99MB PDF 举报

"这篇论文探讨了使用深度强化学习在全规模多人在线战斗竞技游戏（MOBA）中的应用，如王者荣耀、英雄联盟和Dota2。这些游戏对AI系统提出了多智能体、庞大状态-动作空间、复杂操作控制等重大挑战。尽管AI在玩MOBA游戏方面已经引起广泛关注，但现有工作在处理由于英雄组合爆炸性增长（即阵容选择）导致的游戏原始复杂性方面存在不足。OpenAI的Dota AI仅限于17个英雄池，而全面的MOBA游戏没有这样的限制，因此完全掌握无约束的MOBA游戏仍然是一个巨大的挑战。" 本文深入研究了如何利用深度强化学习技术来解决这些问题。深度强化学习是一种机器学习方法，它结合了深度学习的表示能力与强化学习的决策制定过程，使AI能够通过与环境的交互来学习策略。在MOBA游戏中，这意味着AI需要理解复杂的战场情况，包括敌我英雄的位置、技能、装备以及团队动态，然后做出最佳决策，比如英雄选择、走位、技能释放和团队配合。论文指出，当前的AI方法在处理大量可能的英雄组合时遇到困难。每种组合可能导致不同的战略和战术，这使得学习过程变得更加复杂。为了克服这个挑战，作者可能提出了新的算法或框架，以更有效地探索和学习这个庞大的决策空间。这可能涉及到使用更高效的状态表示、动作分解、策略搜索或者利用元学习来快速适应新的英雄组合。此外，论文可能还讨论了如何处理MOBA游戏中复杂的操作控制问题。在实际游戏中，英雄的操作需要精细的手动控制，例如准确的技能释放时机和位置。这要求AI不仅要理解游戏规则，还要模拟人类玩家的微操作。作者可能提出了新的强化学习模型或训练方法，让AI能够在这些复杂的操作上达到接近或超越人类玩家的水平。论文还可能涵盖了评估和验证AI性能的方法，包括与人类玩家的对战、自我对弈和模拟比赛。为了确保AI的公平性和通用性，这样的评估至关重要。这篇论文对于推动AI在全规模MOBA游戏中的应用具有重要意义，它不仅解决了英雄组合多样性的问题，还可能为解决复杂操作控制提供了新的见解。这些进展不仅有助于提高AI在竞技游戏中的表现，也可能为其他需要多智能体协作和复杂决策场景的应用提供借鉴，如机器人控制、交通管理甚至是复杂的社会问题解决方案。

2 Related Work

Our work belongs to system-level AI development for strategy video game playing, so we mainly

discuss representative works along this line, covering RTS and MOBA games.

General RTS games

StarCraft has been used as the testbed for Game AI research in RTS for many

years. Methods adopted by existing studies include rule-based, supervised learning, reinforcement

learning, and their combinations [

]. For rule-based methods, a representative is SAIDA,

the champion of StarCraft AI Competition 2018 (see

https://github.com/TeamSAIDA/SAIDA

For learning-based methods, recently, AlphaStar combined supervised learning and multi-agent

reinforcement learning and achieved the grandmaster level in playing StarCraft 2 [

]. Our value

estimation (Section 3.2) shares similarity to AlphaStar’s by using invisible opponent’s information.

MOBA games

Recently, a macro strategy model, named Tencent HMS, was proposed for MOBA

Game AI [

]. Speciﬁcally, HMS is a functional component for guiding where to go on the map

during the game, without considering the action execution of agents, i.e., micro control or micro-

management in esports, and is thus not a complete AI solution. The most relevant works are Tencent

Solo [

] and OpenAI Five [

]. Ye et al. [

] performed a thorough and systematic study on the

playing mechanics of different MOBA heroes. They developed a RL system that masters micro

control of agents in MOBA combats. However, only 1v1 games were studied without the much more

sophisticated multi-agent 5v5 games. On the other hand, the similarities between this work and Ye et

al. [

] include: the modeling of action heads (the value heads are different) and off-policy correction

(adaption). In 2019, OpenAI introduced an AI for playing 5v5 games in Dota 2, called OpenAI Five,

with the ability to defeat professional human players [

]. OpenAI Five is based on deep reinforcement

learning via self-play. It trains using Proximal Policy Optimization (PPO) [

]. The major difference

between our work and OpenAI Five is that the goal of this paper is to develop AI programs towards

playing full MOBA games. Hence, methodologically, we introduce a set of techniques of off-policy

adaption, curriculum self-play learning, value estimation, and tree-search that addresses the scalability

issue in training and playing a large pool of heroes. On the other hand, the similarities between this

work and OpenAI Five include: the design of action space for modeling MOBA hero’s actions, the

use of recurrent neural network like LSTM for handling partial observability, and the use of one

model with shared weights to control all heroes.

3 Learning System

To address the complexity of MOBA game-playing, we use a combination of novel and existing

learning techniques for neural network architecture, distributed system, reinforcement learning,

multi-agent training, curriculum learning, and Monte-Carlo tree search. Although we use Honor of

Kings as a case study, these proposed techniques are also applicable to other MOBA games, as the

playing mechanics across MOBA games are similar.

3.1 Architecture

MOBA can be considered as a multi-agent Markov game with partial observations. Central to our AI

is a policy

)

represented by a deep neural network with parameters

. It receives previous

observations and actions

= o

1:t

, a

1:t−1

from the game as inputs, and selects actions

as outputs.

Internally, observations

are encoded via convolutions and fully-connected layers, then combined as

vector representations, processed by a deep sequential network, and ﬁnally mapped to a probability

distribution over actions. The overall architecture is shown in Fig. 1.

The architecture consists of general-purpose network components that model the raw complexity

of MOBA games. To provide informative observations to agents, we develop multi-modal features,

consisting of a comprehensive list of both scalar and spatial features. Scalar features are made up

of observable units’ attributes, in-game statistics and invisible opponent information, e.g., health

point (hp), skill cool down, gold, level, etc. Spatial features consist of convolution channels extracted

from hero’s local-view map. To handle partial observability, we resort to LSTM [

] to maintain

memories between steps. To help target selection, we use target attention [

], which treats the

encodings after LSTM as query, and the stack of game unit encodings as attention keys. To eliminate

unnecessary RL explorations, we design action mask, similar to [

]. To manage the combinatorial

剩余14页未读，继续阅读

星桥翊月

粉丝: 1
资源: 7

深度强化学习在全规模多人在线战斗竞技游戏中的应用探索

"演进中的深度学习框架及PaddlePaddle实践技术分享

"基于深度学习的音乐推荐系统设计与实现

自然语言处理领域的AI推荐系统进阶读物精选

Towards Perspective-Free Object Counting with Deep Learning_2016.pdf

R.Deep.Learning.Essentials.1785280589

Towards Deep Symbolic Reinforcement Learning - 2016 (1609.05518)-计算机科学

TOWARDS FEDERATED LEARNING AT SCALE SYSTEM DESIGN.pdf

深入机器学习——Towards a Deep Learning Compiler for the Cloud 共24页.pdf

Reinforcement Learning 101 - Towards Data Science

ErgodicTheory with a view towards number theory.pdf

最新资源