1.1 Classification Problems and Machine Learning 3
We refer to the objects to be classified as instances. Thus, an instance is a description
of some kind which is used to derive a predicted classification. In the OCR example, the
instances are the images of letters. In the medical-diagnosis example, the instances are
descriptions of a patient’s symptoms. The space of all possible instances is called the in-
stance space or domain, and is denoted by X .A(labeled) example is an instance together
with an associated label indicating its correct classification. Instances are also sometimes
referred to as (unlabeled) examples.
During training, a learning algorithm receives as input a training set of labeled examples
called the training examples. The output of the learning algorithm is a prediction rule called
a classifier or hypothesis. A classifier can itself be thought of as a computer program which
takes as input a new unlabeled instance and outputs a predicted classification; so, in math-
ematical terms, a classifier is a function that maps instances to labels. In this book, we use
the terms classifier and hypothesis fairly interchangeably, with the former emphasizing a
prediction rule’s use in classifying new examples, and the latter emphasizing the fact that
the rule has been (or could be) generated as the result of some learning process. Other
terms that have been used in the literature include rule, prediction rule, classification rule,
predictor, and model.
To assess the quality of a given classifier, we measure its error rate, that is, the frequency
with which it makes incorrect classifications. To do this, we need a test set, a separate set of
test examples. The classifier is evaluated on each of the test instances, and its predictions are
compared against the correct classifications of the test examples. The fraction of examples on
which incorrect classifications were made is called the test error of the classifier. Similarly,
the fraction of mistakes on the training set is called the training error. The fraction of correct
predictions is called the (test or training) accuracy.
Of course, the classifier’s performance on the training set is not of much interest since our
purpose is to build a classifier that works well on unseen data. On the other hand, if there is
no relationship at all between the training set and the test set, then the learning problem is
unsolvable; the future can be predicted only if it resembles the past. Therefore, in designing
and studying learning algorithms, we generally assume that the training and test examples
are taken from the same random source. That is, we assume that the examples are chosen
randomly from some fixed but unknown distribution D over the space of labeled examples
and, moreover, that the training and test examples are generated by the same distribution.
The generalization error of a classifier measures the probability of misclassifying a random
example from this distribution D; equivalently, the generalization error is the expected test
error of the classifier on any test set generated by D. The goal of learning can now be stated
succinctly as producing a classifier with low generalization error.
To illustrate these concepts, consider the problem of diagnosing a patient with coronary
artery disease. For this problem, an instance consists of a description of the patient including
items such as sex, age, cholesterol level, chest pain type (if any), blood pressure, and results
of various medical tests. The label or class associated with each instance is a diagnosis
provided by a doctor as to whether or not the patient described actually suffers from the