游戏人工智能：路径规划、决策制定与学习

需积分: 1 201 浏览量更新于2024-07-27 收藏 2.85MB PDF 举报

"Artificial Intelligence for Computer Games" 是一本由 Pedro Antonio González-Calero 和 Marco Antonio Gómez-Martín 编辑的书籍，专注于探讨人工智能在游戏领域的应用。这本书汇集了学术界在过去十年间在游戏AI方面的主要研究成果，涵盖了路径寻路、决策制定和学习等多个关键领域。在路径寻路方面，书中介绍了新的实时搜索方法，旨在优化AI的路径规划效果，同时通过观察实际玩家的行为来学习路径寻路策略。这些技术有助于构建更灵活、非预设的AI，提升游戏体验。在决策制定部分，书里讨论了游戏设计师（通常是非程序员）可以使用的工具和方法，他们可以通过重用行为模式或过去的案例来创建控制游戏行为的软件。这使得非专业编程背景的游戏设计者也能构建复杂的AI行为。此外，书中还涉及如何将当前商业电子游戏中本质上预编程的AI转变为更具互动性的叙事形式。这种方法利用了多层架构，包括角色的信念、意图和情感，借鉴了智能体系统的研究成果。通过这种方式，故事可以从玩家的交互中自然发展。学习复杂行为是另一大重点，书中讲述了如何利用强化学习、基于案例的推理、神经网络和遗传算法的组合，自动或半自动地从人类或自动玩家的记录轨迹中学习。 "Artificial Intelligence for Computer Games" 是一本深入探讨游戏AI技术的专著，它不仅总结了学术界的最新进展，还提供了实现这些技术的实用工具和方法，对于游戏开发者和AI研究者来说，是一份宝贵的资源。通过本书，读者能够理解并掌握如何在实际游戏中应用AI，创造出更智能、更具挑战性和沉浸感的游戏体验。

2 V. Bulitko et al.

in response to player’s commands and other agents’ actions. As a result, many game

companies impose a constant time limit on the amount of path planning per move

(e.g., one millisecond for all simultaneously moving agents).

While in practice this time limit can be satisﬁed by limiting problem size a pri-

ori, a scientiﬁcally more interesting approach is to impose a constant per-action

time limit independent of the problem size. Doing so severely limits the range

of applicable heuristic search algorithms. For instance, static search algorithms

such as A* [15], Iterative Deepening A* (IDA*) [24]andPRA*[38,39], re-planning

algorithms such as D* [37], anytime algorithms such as ARA* [27], and anytime

re-planning algorithms such as AD* [26] cannot guarantee a constant bound on

planning time per action. This is because all of them produce a complete, possi-

bly abstract, solution before the ﬁrst action can be taken. As the problem increases

in size, their planning time will inevitably increase, exceeding any a priori ﬁnite

upper bound.

Real-time search addresses the problem in a fundamentally different way. In-

stead of computing a complete, possibly abstract, solution before the ﬁrst action

is taken, real-time search algorithms compute (or plan) only a few ﬁrst actions

for the agent to take. This is usually done by conducting a lookahead search of a

ﬁxed depth (also known as “search horizon,” “search depth,” or “lookahead depth”)

around the agent’s current state and using a heuristic (i.e., an estimate of the remain-

ing travel cost) to select the next few actions. The actions are then taken and the

planning–execution cycle repeats [25]. Since the goal state is not seen in most such

local searches, the agent runs the risks of heading into a dead end or, more gener-

ally, selecting suboptimal actions. To address this problem, most real-time heuristic

search algorithms update (or learn) their heuristic function over time.

The learning process has precluded real-time heuristic search agents from being

widely deployed for pathﬁnding in video games. The problem is that such agents

tend to “scrub” (i.e., repeatedly revisit) the state space due to the need to ﬁll in

heuristic depressions [19]. As a result, solution quality can be quite low and, visu-

ally, the scrubbing behavior is perceived as irrational.

Since the seminal work on Learning Real-Time A* (LRTA*) [25], researchers

have attempted to speed up the learning process. Most of the resulting algorithms

can be described by the following four attributes:

The local search space is the set of states whose heuristic costs are accessed in

the planning stage. The two common choices are full-width limited-depth looka-

head [14, 16, 17, 25, 31, 33–36], and A*-shaped lookahead [21, 23]. Additional

choices are decision-theoretic-based shaping [32] and dynamic lookahead depth-

selection [7, 29]. Finally, searching in a smaller, abstracted state has been used as

well [13].

The local learning space is the set of states whose heuristic values are updated.

Common choices are: the current state only [7, 14, 25,33

–35], all states within the

local search space [21,23], and previously visited states and their neighbors [16,17,

31,36].

Henceforth, we will use the terms action and move synonymously.

Real-Time Heuristic Search for Pathﬁnding in Video Games 3

A learning rule is used to update the heuristic costs of the states in the learning

space. The common choices are mini-min [16, 17, 25, 31, 34–36], its weighted ver-

sions [33], max of mins [7], modiﬁed Dijkstra’s algorithm [21], and updates with

respect to the shortest path from the current state to the best-looking state on the

frontier of the local search space [23]. Additionally, several algorithms learn more

than one heuristic function [14,32,33].

The control strategy decides on the move following the planning and learning

phases. Commonly used strategies include: the ﬁrst move of an optimal path to the

most promising frontier state [14, 16, 17, 25], the entire path [7], and backtracking

moves [7, 34–36].

Given the multitude of proposed algorithms, uniﬁcation efforts have been under-

taken. In particular, [10] suggested a framework, called Learning Real-Time Search

(LRTS), to combine and extend LRTA* [25], weighted LRTA* [33], SLA* [35],

SLA*T [34], and, to a large extent,

-Trap [7].

A breakthrough in performance came with D LRTA* [12] which, for the ﬁrst time

in real-time heuristic search, used automatically selected local subgoals instead of

the global goal. The subgoal selection mechanism has later been reﬁned in k Nearest

Neighbors LRTA* (kNN LRTA*), which we review in this chapter.

In this chapter, we review the following three modern real-time heuristic search

algorithms: kNN LRTA*, TBA*, and RIBS.

kNN LRTA* [8, 9] uses a nearest-neighbor algorithm over a database of solved

cases. It introduced the idea of compressing a solution path into a series of subgoals

so that each can be “easily” reached from the previous one. In doing so, it uses

hill-climbing as a proxy for the notion of “easy reachability by LRTA*.”

If precomputing a database of solved cases and compressing them into subgoals

are not feasible, then one can use the following two modern real-time heuristic

search algorithms.

TBA* [2] is a time-bounded variant of the classic A*. Unlike A* that plans a

complete path before committing to the ﬁrst action, Time-Bounded A* (TBA*) in-

terrupts its planning periodically to act. Because initially a complete path to the goal

is unknown, the agent instead moves toward the most promising state on the open

list, backtracking its steps as necessary. This interleaving of planning and acting

is done in such a way that both real-time behavior and completeness are ensured.

Among the attractions of this algorithm are its simplicity and broad applicability

as well as the fact that reasonable solution quality and real-time performance is

achieved without the need for precomputations or state-space abstractions.

RIBS [40] takes a different approach to learning real-time search. Instead of

learning a heuristic estimate of the distance from an arbitrary state to the goal

as most algorithms have traditionally done, Real-Time Iterative-Deepening Best-

First Search (RIBS) learns accurate distances from the start state. This approach

has just recently been explored, and more work is required to deploy this algo-

rithm in commercial games. But, the study of RIBS has lead to critical insights

in the performance of real-time algorithms and approaches that are likely to be

successful.

4 V. Bulitko et al.

The rest of the chapter is organized as follows. In Sect. 2 we formulate the prob-

lem. Section 3 presents three classic algorithms that serve as the core to TBA*,

kNN LRTA*, and RIBS which are reviewed in Sects. 5–7, respectively. Finally, we

discuss applications beyond pathﬁnding in Sect. 8 and conclude the chapter.

2 Problem Formulation

We deﬁne a heuristic search problem as an undirected graph containing a ﬁnite set

of states (vertices) and weighted edges, with a single state designated as the goal

state. At every time step, a search agent has a single current state, a vertex in the

search graph, and takes an action (or makes a move) by traversing an out-edge of

the current state. By traversing an edge between states s

and s

, the agent changes

its current state from s

to s

. We say that a state is visited by the agent if and only

if it is the agent’s current state at some point of time. As it is usual in the ﬁeld of

real-time heuristic search, we assume that path planning happens between the moves

(i.e., the agent does not think while traversing an edge). The “plan a move” – “travel

an edge” loop continues until the agent arrives at its goal state, thereby solving the

problem.

Each edge has a positive cost associated with it. The total cost of edges traversed

by an agent from its start state until it arrives at the goal state is called the solution

cost. We require algorithms to be complete (i.e., produce a path from start to goal

in a ﬁnite amount of time if such a path exists). In order to guarantee completeness

for real-time heuristic search, we make the assumption of safe explorability of our

search problems. Speciﬁcally, all edge costs are ﬁnite and for any states s

,if

there is a path between s

and s

and there is a path between s

and s

, then there is

also a path between s

and s

Formally, all algorithms discussed in this chapter are applicable to any such

heuristic search problem. To keep the presentation focused and intuitive, we use a

particular type of heuristic search problems, video game pathﬁnding in grid worlds,

for the rest of the chapter. In video game map settings, states are vacant square grid

cells. Each cell is connected to four cardinally (i.e., west, north, east, south) and

four diagonally neighboring cells. Outbound edges of a vertex are moves available

in the corresponding cell, and in the rest of the chapter we will use the terms action

and move interchangeably. The edge costs are deﬁned as 1 for cardinal moves and

1.4 for diagonal moves.

An agent plans its next action by considering states in a local search space sur-

rounding its current position. A heuristic function (or simply heuristic) estimates the

(remaining) travel cost between a state and the goal. It is used by the agent to rank

available actions and select the most promising one. Furthermore, we consider only

admissible and consistent heuristic functions which do not overestimate the actual

We use 1.4 instead of the Euclidean

√

2 to avoid errors in ﬂoating point computations.

Real-Time Heuristic Search for Pathﬁnding in Video Games 5

remaining cost to the goal and whose difference in values for any two states does not

exceed the cost of an optimal path between these states. In this chapter, we use octile

distance – the minimum cumulative edge cost between two vertices ignoring map

obstacles – as our heuristic. This heuristic is admissible and consistent. An agent

can modify its heuristic function in any state to avoid getting stuck in local minima

of the heuristic function, as well as to improve its action selection with experience.

We evaluate the algorithms presented in this chapter with respect to several per-

formance measures. First, we measure mean planning time in terms of both the

number of states expanded

as well as the CPU time.

The second performance measure of our study is sub-optimality deﬁned as the

ratio of the solution cost found by the agent to the minimum solution cost −1and

times 100%. To illustrate, suboptimality of 0% indicates an optimal path and sub-

optimality of 50% indicates a path 1.5 times as costly as the optimal path. We also

measure the precomputation time for kNN LRTA* as well as the memory require-

ments of all three algorithms.

3 The Core Algorithms

TBA*, RIBS, and kNN LRTA* presented later in this chapter build on three classic

heuristic search algorithms: A* [15], IDA* [24], and LRTA* [25]. We brieﬂy review

these algorithms and discuss their drawbacks for real-time heuristic search below.

3.1 A*

The classic A* algorithm [15] is a fundamental algorithm for pathﬁnding. Given

astartstates and a goal state g, it ﬁnds a least-cost path between the two states.

It is a best-ﬁrst search algorithm and uses a distance-plus-cost-estimate function

to determine which state to expand next. The cost function, denoted f (n), consists

of two parts: f (n)=g(s,n)+h(n, g) where g(s,n) is the distance of the shortest

path found so far between the start state s and state n,andh(n,g) is the heuristic

estimate of the distance cost of traveling from state n to the goal g. The algorithm

uses two containers to keep track of its search progress: the open list storing states

that have been encountered but not expanded yet, and the closed list storing states

already expanded. The algorithm iteratively picks the state from the open list with

the lowest f-cost, expands the state, and places its children on the open list. To

determine whether a child state goes into the open list, it cannot already be on the

closed list or on the open list with a lower cost. The state just expanded is moved

to the closed list. The role of the closed list is both to avoid state re-expansions and

A state is called expanded if all of its immediate children are generated.

6 V. Bulitko et al.

to reconstruct the solution path once the goal is found. This continues until the goal

state is removed from the open list, in which case the solution path is reconstructed

from the closed list.

The algorithm is complete, ﬁnds an optimal solution when used with an admis-

sible heuristic, and never re-expands states given a consistent heuristic.

3.2 Iterative Deepening A*

Early researchers noticed that A* could not solve large problems because it would

run out of memory. IDA* [24] was thus developed as an alternate algorithm that

could ﬁnd optimal solutions, like A*, but that would only require memory usage

linear in the cost of the solution. Most combinatorial puzzles, which were the orig-

inal focus of IDA*, have state spaces exponential in the solution cost, and so are a

natural ﬁt for IDA*. Henceforth, we will call such problems exponential domains.

One way to understand how IDA* works is to contrast it to how A* works. Given

a consistent heuristic, the lowest f -cost of any state in A*’s open list will monoton-

ically increase during search. Imagine that we grouped states according to their cost

when expanded by A*. For instance, all the states with cost 12 might be expanded

ﬁrst, followed by the states with cost 14, and so on. We demonstrate this in Fig. 1,

showing contours that delineate states of each successive cost.

IDA* will ﬁrst expand these groups of states in the same order as A* (modulo

tie-breaking among states with equal f -cost) but will subsequently revisit states in

subsequent iterations of the algorithm. It does this because it does not maintain

an open list. Instead, it performs multiple depth-ﬁrst searches, with each search

bounded by the best f -cost which has yet to be explored. All the states of a particular

cost are explored before the next iteration begins anew. In exponential domains such

as common combinatorial puzzles, the largest number of states will be expanded in

the last iteration, amortizing away the cost of the previous iterations. Because it can

be expensive to maintain an open list, IDA* can be faster than A* in practice.

IDA* works best in exponential domains where the state space does not contain

many cycles. It might, therefore, seem that IDA* is not well suited to grid-based

worlds. These domains are usually polynomial, as the number of states in a map

Fig. 1 Iterative Deepening

A* (IDA*) search contours:

IDA* performs multiple

depth-ﬁrst searches within

each successive cost frontier

found during search

f = 14

f = 12

f = 16

剩余213页未读，继续阅读

whatabig

粉丝: 13
资源: 58

游戏人工智能：路径规划、决策制定与学习

Mesto.io:明智地找到你的床（非官方的Airbnb“ API”演示）

SpikeIns2016:Aaron的即插即用手稿的代码和手稿文件

y

CPPC++_低成本实现Wooting键盘的Rapid trigger功能不必为几个按键购买整个键盘人人都能做Wouo.zip

CPPC++_可能是世界上最快的协同程序库.zip

ConsulHelper，.Net微服务基础框架，已支持.NetCore，具备服务发现、健康检查、服务分级、分布式配置.zip

各种三维模型（gltf，glb格式）

yii2.0+admin后台以及rbac权限.zip

SpringBlade3.0架构核心工具包.zip

python各种项目代码.zip

最新资源