4.
if b is a not a final state in the game, then V(b)
=
V(bl), where b' is the best
final board state that can be achieved starting from b and playing optimally
until the end of the game (assuming the opponent plays optimally, as well).
While this recursive definition specifies a value of V(b) for every board
state b, this definition is not usable by our checkers player because it is not
efficiently computable. Except for the trivial cases (cases
1-3)
in which the game
has already ended, determining the value of V(b) for a particular board state
requires (case
4)
searching ahead for the optimal line of play, all the way to
the end of the game! Because this definition is not efficiently computable by our
checkers playing program, we say that it is a
nonoperational
definition. The goal
of learning in this case is to discover an
operational
description of
V;
that is, a
description that can be used by the checkers-playing program to evaluate states
and select moves within realistic time bounds.
Thus, we have reduced the learning task in this case to the problem of
discovering an
operational description of the ideal targetfunction
V. It may be
very difficult in general to learn such
an
operational form of V perfectly. In fact,
we often expect learning algorithms to acquire only some
approximation
to the
target function, and for this reason the process of learning the target function
is often called
function approximation.
In the current discussion we will use the
symbol
? to refer to the function that is actually learned by our program, to
distinguish it from the ideal target function
V.
1.23
Choosing a Representation for the Target Function
Now that we have specified the ideal target function V, we must choose a repre-
sentation that the learning program will use to describe the function
c
that it will
learn. As with earlier design choices, we again have many options. We could,
for example, allow the program to represent using a large table with a distinct
entry specifying the value for each distinct board state. Or we could allow it to
represent using a collection of rules that match against features of the board
state, or a quadratic polynomial function of predefined board features, or an arti-
ficial neural network. In general, this choice of representation involves a crucial
tradeoff. On one hand, we wish to pick a very expressive representation to allow
representing as close an approximation as possible to the ideal target function
V.
On the other hand, the more expressive the representation, the more training data
the program will require in order to choose among the alternative hypotheses it
can represent. To keep the discussion brief, let us choose a simple representation:
for any given board state, the function
c
will be calculated as a linear combination
of the following board features:
0
xl:
the number of black pieces on the board
x2: the number of red pieces on the board
0
xs:
the number of black kings on the board
0
x4:
the number of red kings on the board