深度强化学习在VizDoom比赛中的胜者：Facebook的Actor-Critic Curriculum Learning框架

需积分: 10 178 浏览量更新于2024-08-26 收藏 1.16MB PDF 举报

"facebook在VizDoom比赛中使用强化学习策略获得第一，结合了Reward Shaping和Curriculum Learning技术。" 本文是2017年ICLR会议上发表的一篇论文，探讨了如何训练基于视觉的代理在第一人称射击游戏（FPS）——Doom中进行智能决策。研究中提出的新框架融合了最先进的强化学习方法（异步优势演员-评论家模型，A3C）和课程学习。模型设计简洁，仅依赖于游戏本身的环境状态，而非对手的信息。在强化学习领域，深度学习已经在完全可观察的环境中取得了超越人类的表现，如Atari游戏和围棋。A3C算法是强化学习中的一个重要突破，它通过异步更新提高了训练效率和性能。然而，在像Doom这样的环境中，由于奖励稀疏（sparse reward）和环境复杂性，训练具有挑战性。为了应对这种挑战，论文引入了Reward Shaping和Curriculum Learning。Reward Shaping是一种改进强化学习策略的技术，通过人为设计或调整奖励函数来指导代理更快地学习策略。在Doom游戏中，由于只有有限的正反馈（例如，击败敌人或完成任务），Reward Shaping可以帮助代理更早地理解其行为的影响。 Curriculum Learning则借鉴了教育理念，即从简单到复杂的逐步学习。在Doom环境中，这意味着先让代理在简单的任务或地图上学习，然后逐渐增加难度，使其能够逐步掌握更复杂的策略。这种方法有助于解决在复杂环境中学习的困难，防止代理陷入局部最优。论文中提到的代理在已知地图上的11场比赛中赢得了10场，并在2016年的VizDoom AI竞赛Track1中以显著优势（比第二名高出35%的得分）获得冠军。这证明了结合A3C、Reward Shaping和Curriculum Learning的有效性，为在高复杂度环境下的强化学习提供了新的可能。这篇论文对强化学习在环境复杂、奖励稀疏的场景中的应用提供了有价值的见解，展示了如何通过巧妙结合现有技术来提高学习效率和性能。这对于未来开发能在更多现实世界环境中自主学习和适应的智能代理具有重要意义。

Published as a conference paper at ICLR 2017

TRAINING AGENT FOR FIRST-PERSON SHOOTER

GAME WITH ACTOR-CRITIC CURRICULUM LEARNING

Yuxin Wu

Carnegie Mellon University

ppwwyyxx@gmail.com

Yuandong Tian

Facebook AI Research

yuandong@fb.com

ABSTRACT

In this paper, we propose a new framework for training vision-based agent for

First-Person Shooter (FPS) Game, in particular Doom. Our framework combines

the state-of-the-art reinforcement learning approach (Asynchronous Advantage

Actor-Critic (A3C) model [Mnih et al. (2016)]) with curriculum learning. Our

model is simple in design and only uses game states from the AI side, rather than

using opponents’ information [Lample & Chaplot (2016)]. On a known map, our

agent won 10 out of the 11 attended games and the champion of Track1 in ViZ-

Doom AI Competition 2016 by a large margin, 35% higher score than the second

place.

1 I

NTRODUCTION

Deep Reinforcement Learning has achieved super-human performance in fully observable environ-

ments, e.g., in Atari Games [Mnih et al. (2015)] and Computer Go [Silver et al. (2016)]. Recently,

Asynchronous Advantage Actor-Critic (A3C) [Mnih et al. (2016)] model shows good performance

for 3D environment exploration, e.g. labyrinth exploration. However, in general, to train an agent

in a partially observable 3D environment from raw frames remains an open challenge. Direct appli-

cation of A3C to competitive 3D scenarios, e.g. 3D games, is nontrivial, partly due to sparse and

long-term rewards in such scenarios.

Doom is a 1993 First-Person Shooter (FPS) game in which a player ﬁghts against other computer-

controlled agents or human players in an adversarial 3D environment. Previous works on FPS

AI [van Waveren (2001)] focused on using hand-tuned state machines and privileged information,

e.g., the geometry of the map, the precise location of all players, to design playable agents. Although

state-machine is conceptually simple and computationally efﬁcient, it does not operate like human

players, who only rely on visual (and possibly audio) inputs. Also, many complicated situations

require manually-designed rules which could be time-consuming to tune.

In this paper, we train an AI agent in Doom with a framework that based on A3C with convolutional

neural networks (CNN). This model uses only the recent 4 frames and game variables from the AI

side, to predict the next action of the agent and the value of the current situation. We follow the

curriculum learning paradigm [Bengio et al. (2009); Jiang et al. (2015)]: start from simple tasks and

then gradually try harder ones. The difﬁculty of the task is controlled by a variety of parameters

in Doom environment, including different types of maps, strength of the opponents and the design

of the reward function. We also develop adaptive curriculum training that samples from a varying

distribution of tasks to train the model, which is more stable and achieves higher score than A3C

with the same number of epoch. As a result, our trained agent, named F1, won the champion in

Track 1 of ViZDoom Competition

by a large margin.

There are many contemporary efforts on training a Doom AI based on the VizDoom plat-

form [Kempka et al. (2016)] since its release. Arnold [Lample & Chaplot (2016)] also uses game

frames and trains an action network using Deep Recurrent Q-learning [Hausknecht & Stone (2015)],

and a navigation network with DQN [Mnih et al. (2015)]. However, there are several important dif-

ferences. To predict the next action, they use a hybrid architecture (CNN+LSTM) that involves

more complicated training procedure. Second, in addition to game frames, they require internal

http://vizdoom.cs.put.edu.pl/competition-cig-2016/results

下载后可阅读完整内容，剩余9页未读，立即下载

Glen997

粉丝: 344

深度强化学习在VizDoom比赛中的胜者：Facebook的Actor-Critic Curriculum Learning框架

一年来Facebook前沿论文最新进展 2018.11.04 方建勇1

Facebook react学术论文分析报告1

EMNLP 2018 FaceBook 、Google 论文集

Facebook照片存储相关论文

云计算英文论文 Facebook

Practical Lessons from Predicting Clicks on Ads at Facebook 论文阅读

Facebook的Debordian分析-研究论文

Facebook上的Echo Chambers-研究论文

论文研究 - Facebook-社交媒体营销的福音

大学生使用Facebook的多维研究-研究论文

最新资源