on Windows and Mac OS, but we also provide a limited headless build that runs on Linux especially
for machine learning and distributed use cases.
Using this API we built PySC2
5
, an open source environment that is optimised for RL agents. PySC2
is a Python environment that wraps the StarCraft II API to ease the interaction between Python rein-
forcement learning agents and StarCraft II. PySC2 defines an action and observation specification,
and includes a random agent and a handful of scripted agents as examples. It also includes some
mini-games as challenges and visualisation tools to understand what the agent can see and do.
StarCraft II updates the simulation 16 (at “normal speed”) or 22.4 (at “fast speed”) times per second.
The game is mostly deterministic, but it does have some randomness mainly for cosmetic reasons;
the two main random elements are weapon speed and update order. These sources of randomness
can be removed/mitigated by setting a random seed.
We now describe the environment which was used for all of the experiments in this paper.
3.1 Full Game Description and Reward Structure
In the full 1v1 game of StarCraft II, two opponents spawn on a map which contains resources and
other elements such as ramps, bottlenecks, and islands. To win a game, a player must: 1. accumulate
resources (minerals and vespene gas), 2. construct production buildings, 3. amass an army, and 4.
eliminate all of the opponent’s buildings. A game typically lasts from a few minutes to one hour,
and early actions taken in the game (e.g., which buildings and units are built) have long term conse-
quences. Players have imperfect information since they can only see the portion of the map where
they have units. If they want to understand and react to their opponents’ strategy they must send units
to scout. As we describe later in this section, the action space is also quite unique and challenging.
Most people play online against other human players. The most common games are 1v1, but team
games are possible too (2v2, 3v3 or 4v4), as are more complicated games with unbalanced teams
or more than two teams. Here we focus on the 1v1 format, the most popular form of competitive
StarCraft, but may consider more complicated situations in the future.
StarCraft II includes a built-in AI which is based on a set of handcrafted rules and comes with
10 levels of difficulty (the three strongest of which cheat by getting extra resources or privileged
vision). Unfortunately, the fact that they are scripted means their strategies are fairly narrow. As
such, they are easily exploitable, meaning that humans tend to lose interest in them fairly quickly.
Nevertheless, they are a reasonable first challenge for a purely learned approach like the baselines
we investigate in sections 4 and 5; they play far better than random, play very quickly with little
compute, and offer consistent baselines to compare against.
We define two different reward structures: ternary 1 (win) / 0 (tie) / −1 (loss) received at the end
of a game (with all-zero rewards during the game), and Blizzard score. The ternary win/tie/loss
score is the real reward that we care about. The Blizzard score is the score seen by players on the
victory screen at the end of the game. While players can only see this score at the end of the game, we
provide access to the running Blizzard score at every step during the game so that the change in score
can be used as a reward for reinforcement learning. It is computed as the sum of current resources
and upgrades researched, as well as units and buildings currently alive and being built. This means
that the player’s cumulative reward increases with more mined resources, decreases when losing
units/buildings, and all other actions (training units, building buildings, and researching) do not
affect it. The Blizzard score is not zero-sum since it is player-centric, it is far less sparse than the
ternary reward signal, and it correlates to some extent with winning or losing.
3.2 Observations
StarCraft II uses a game engine which renders graphics in 3D. Whilst utilising the underlying game
engine which simulates the whole environment, the StarCraft II API does not currently render RGB
pixels. Rather, it generates a set of “feature layers”, which abstract away from the RGB images seen
during human play, while maintaining the core spatial and graphical concepts of StarCraft II (see
Figure 2).
5
https://github.com/deepmind/pysc2
4