2 V. Bulitko et al.
in response to player’s commands and other agents’ actions. As a result, many game
companies impose a constant time limit on the amount of path planning per move
1
(e.g., one millisecond for all simultaneously moving agents).
While in practice this time limit can be satisfied by limiting problem size a pri-
ori, a scientifically more interesting approach is to impose a constant per-action
time limit independent of the problem size. Doing so severely limits the range
of applicable heuristic search algorithms. For instance, static search algorithms
such as A* [15], Iterative Deepening A* (IDA*) [24]andPRA*[38,39], re-planning
algorithms such as D* [37], anytime algorithms such as ARA* [27], and anytime
re-planning algorithms such as AD* [26] cannot guarantee a constant bound on
planning time per action. This is because all of them produce a complete, possi-
bly abstract, solution before the first action can be taken. As the problem increases
in size, their planning time will inevitably increase, exceeding any a priori finite
upper bound.
Real-time search addresses the problem in a fundamentally different way. In-
stead of computing a complete, possibly abstract, solution before the first action
is taken, real-time search algorithms compute (or plan) only a few first actions
for the agent to take. This is usually done by conducting a lookahead search of a
fixed depth (also known as “search horizon,” “search depth,” or “lookahead depth”)
around the agent’s current state and using a heuristic (i.e., an estimate of the remain-
ing travel cost) to select the next few actions. The actions are then taken and the
planning–execution cycle repeats [25]. Since the goal state is not seen in most such
local searches, the agent runs the risks of heading into a dead end or, more gener-
ally, selecting suboptimal actions. To address this problem, most real-time heuristic
search algorithms update (or learn) their heuristic function over time.
The learning process has precluded real-time heuristic search agents from being
widely deployed for pathfinding in video games. The problem is that such agents
tend to “scrub” (i.e., repeatedly revisit) the state space due to the need to fill in
heuristic depressions [19]. As a result, solution quality can be quite low and, visu-
ally, the scrubbing behavior is perceived as irrational.
Since the seminal work on Learning Real-Time A* (LRTA*) [25], researchers
have attempted to speed up the learning process. Most of the resulting algorithms
can be described by the following four attributes:
The local search space is the set of states whose heuristic costs are accessed in
the planning stage. The two common choices are full-width limited-depth looka-
head [14, 16, 17, 25, 31, 33–36], and A*-shaped lookahead [21, 23]. Additional
choices are decision-theoretic-based shaping [32] and dynamic lookahead depth-
selection [7, 29]. Finally, searching in a smaller, abstracted state has been used as
well [13].
The local learning space is the set of states whose heuristic values are updated.
Common choices are: the current state only [7, 14, 25,33
–35], all states within the
local search space [21,23], and previously visited states and their neighbors [16,17,
31,36].
1
Henceforth, we will use the terms action and move synonymously.