4 CHAPTER 1. INTRODUCTION
images — variations in lighting, position of the fish on the conveyor, even “static”
due to the electronics of the camera itself.
Given that there truly are differences between the population of sea bass and that
of salmon, we view them as having different models — different descriptions, whichmodel
are typically mathematical in form. The overarching goal and approach in pattern
classification is to hypothesize the class of these models, process the sensed data
to eliminate noise (not due to the models), and for any sensed pattern choose the
model that corresponds best. Any techniques that further this aim should be in the
conceptual toolbox of the designer of pattern recognition systems.
Our prototype system to perform this very specific task might well have the form
shown in Fig. 1.1. First the camera captures an image of the fish. Next, the camera’s
signals are preprocessed to simplify subsequent operations without loosing relevantpre-
processing information. In particular, we might use a segmentation operation in which the images
segmentation
of different fish are somehow isolated from one another and from the background. The
information from a single fish is then sent to a feature extractor, whose purpose is to
feature
extraction
reduce the data by measuring certain “features” or “properties.” These features
(or, more precisely, the values of these features) are then passed to a classifier that
evaluates the evidence presented and makes a final decision as to the species.
The preprocessor might automatically adjust for average light level, or threshold
the image to remove the background of the conveyor belt, and so forth. For the
moment let us pass over how the images of the fish might be segmented and consider
how the feature extractor and classifier might be designed. Suppose somebody at the
fish plant tells us that a sea bass is generally longer than a salmon. These, then,
give us our tentative models for the fish: sea bass have some typical length, and this
is greater than that for salmon. Then length becomes an obvious feature, and we
might attempt to classify the fish merely by seeing whether or not the length l of
a fish exceeds some critical value l
∗
. To choose l
∗
we could obtain some design or
training samples of the different types of fish, (somehow) make length measurements,training
samples and inspect the results.
Suppose that we do this, and obtain the histograms shown in Fig. 1.2. These
disappointing histograms bear out the statement that sea bass are somewhat longer
than salmon, on average, but it is clear that this single criterion is quite poor; no
matter how we choose l
∗
, we cannot reliably separate sea bass from salmon by length
alone.
Discouraged, but undeterred by these unpromising results, we try another feature
— the average lightness of the fish scales. Now we are very careful to eliminate
variations in illumination, since they can only obscure the models and corrupt our
new classifier. The resulting histograms, shown in Fig. 1.3, are much more satisfactory
— the classes are much better separated.
So far we have tacitly assumed that the consequences of our actions are equally
costly: deciding the fish was a sea bass when in fact it was a salmon was just as
undesirable as the converse. Such a symmetry in the cost is often, but not invariablycost
the case. For instance, as a fish packing company we may know that our customers
easily accept occasional pieces of tasty salmon in their cans labeled “sea bass,” but
they object vigorously if a piece of sea bass appears in their cans labeled “salmon.”
If we want to stay in business, we should adjust our decision boundary to avoid
antagonizing our customers, even if it means that more salmon makes its way into
the cans of sea bass. In this case, then, we should move our decision boundary x
∗
to
smaller values of lightness, thereby reducing the number of sea bass that are classified
as salmon (Fig. 1.3). The more our customers object to getting sea bass with their