1.1 Why Do Machines Need to Learn? 5
case π is simply mapping the Boolean matrix to a Boolean vector by row scanning in
such a way that there is no information loss when passing from e to x. As it will be
shown later, on the other hand, the preprocessing function π typically returns a pat-
tern representation with information loss with respect to the original environmental
representation e ∈ E . Function f maps this representation onto the one-hot encoding
of number 2 and, finally, h transforms this code onto a representation of the same
number that is more suitable for the task at hand:
π
−→ (0, 0, 0, 1, 1, 0, 0, 0,...,0, 0, 0, 0, 0, 0, 1, 1)
f
−→ (0, 0, 1, 0, 0, 0, 0, 0, 0, 0)
h
−→ 2.
Overall the action of χ can be nicely written as χ(
) = 2. In many learning ma-
chines, the output encoding function h plays a more important role, which consists
of converting real-valued representations y = f(x) ∈ R
10
onto the corresponding
one-hot representation. For example, in this case, one could simply choose h such
that h
i
(y) = δ
(i,arg max
κ
y)
, where δ denotes the Kronecher’s delta. In doing so, the
hot bit is located at the same position as the maximum of y. While this apparently
makes sense, a more careful analysis suggests that such an encoding suffers from a
problem that is pointed out in Exercise 2.
Functions π(·) and h(·) adapt the environmental information and the decision
to the internal representation of the agent. As it will be seen throughout the book,
depending on the task, E and O can be highly structured, and their internal represen-
tation plays a crucial role in the learning process. The specific role of π(·) is to encode
the environmental information into an appropriate internal representation. Likewise,
function h(·) is expected to return the decision on the environment on the basis of the
internal state of the machine. The core of learning is the appropriate discovering of
f(·), so as to obey the constraints dictated by the environment.
What are the environmental conditions that are dictated by the environment?
Learning from
examples.
Since the dawn of machine learning, scientists have mostly been following the princi-
ple of learning from examples. Under this framework, an intelligent agent is expected
to acquire concepts by induction on the basis of collections L ={(e
κ
,o
κ
), κ =
1,...,)}, where an oracle, typically referred to as the supervisor, pairs inputs
e
κ
∈ E with decision values o
κ
∈ O. A first important distinction concerns clas-
sification and regression tasks. In the first case, the decision requires the finiteness of
O, while in the second case O can be thought of as a continuous set.
Classification and
regression.
First, let us focus on classification. In simplest cases, O ⊂ N is a collection
of integers that identify the class of e. For example, in the handwritten character
recognition problem, restricted to digits, we might have |O|=10. In this case, we can
promptly see the importance of distinguishing the physical, the environmental, and
the decision information with respect to their corresponding internal representation
of the machine. At the pure physical level, handwritten chars are the outcome of
the physical process of light reflection. It can be captured as soon as we define the
retina R as a rectangle of R
2
, and interpret the reflected light by the image function