
Published as a conference paper at ICLR 2017
TRAINING AGENT FOR FIRST-PERSON SHOOTER
GAME WITH ACTOR-CRITIC CURRICULUM LEARNING
Yuxin Wu
Carnegie Mellon University
ppwwyyxx@gmail.com
Yuandong Tian
Facebook AI Research
yuandong@fb.com
ABSTRACT
In this paper, we propose a new framework for training vision-based agent for
First-Person Shooter (FPS) Game, in particular Doom. Our framework combines
the state-of-the-art reinforcement learning approach (Asynchronous Advantage
Actor-Critic (A3C) model [Mnih et al. (2016)]) with curriculum learning. Our
model is simple in design and only uses game states from the AI side, rather than
using opponents’ information [Lample & Chaplot (2016)]. On a known map, our
agent won 10 out of the 11 attended games and the champion of Track1 in ViZ-
Doom AI Competition 2016 by a large margin, 35% higher score than the second
place.
1 I
NTRODUCTION
Deep Reinforcement Learning has achieved super-human performance in fully observable environ-
ments, e.g., in Atari Games [Mnih et al. (2015)] and Computer Go [Silver et al. (2016)]. Recently,
Asynchronous Advantage Actor-Critic (A3C) [Mnih et al. (2016)] model shows good performance
for 3D environment exploration, e.g. labyrinth exploration. However, in general, to train an agent
in a partially observable 3D environment from raw frames remains an open challenge. Direct appli-
cation of A3C to competitive 3D scenarios, e.g. 3D games, is nontrivial, partly due to sparse and
long-term rewards in such scenarios.
Doom is a 1993 First-Person Shooter (FPS) game in which a player fights against other computer-
controlled agents or human players in an adversarial 3D environment. Previous works on FPS
AI [van Waveren (2001)] focused on using hand-tuned state machines and privileged information,
e.g., the geometry of the map, the precise location of all players, to design playable agents. Although
state-machine is conceptually simple and computationally efficient, it does not operate like human
players, who only rely on visual (and possibly audio) inputs. Also, many complicated situations
require manually-designed rules which could be time-consuming to tune.
In this paper, we train an AI agent in Doom with a framework that based on A3C with convolutional
neural networks (CNN). This model uses only the recent 4 frames and game variables from the AI
side, to predict the next action of the agent and the value of the current situation. We follow the
curriculum learning paradigm [Bengio et al. (2009); Jiang et al. (2015)]: start from simple tasks and
then gradually try harder ones. The difficulty of the task is controlled by a variety of parameters
in Doom environment, including different types of maps, strength of the opponents and the design
of the reward function. We also develop adaptive curriculum training that samples from a varying
distribution of tasks to train the model, which is more stable and achieves higher score than A3C
with the same number of epoch. As a result, our trained agent, named F1, won the champion in
Track 1 of ViZDoom Competition
1
by a large margin.
There are many contemporary efforts on training a Doom AI based on the VizDoom plat-
form [Kempka et al. (2016)] since its release. Arnold [Lample & Chaplot (2016)] also uses game
frames and trains an action network using Deep Recurrent Q-learning [Hausknecht & Stone (2015)],
and a navigation network with DQN [Mnih et al. (2015)]. However, there are several important dif-
ferences. To predict the next action, they use a hybrid architecture (CNN+LSTM) that involves
more complicated training procedure. Second, in addition to game frames, they require internal
1
http://vizdoom.cs.put.edu.pl/competition-cig-2016/results
1