模式识别经典教程：Duda的《Pattern Classification》第二版

需积分: 45 40 浏览量更新于2024-07-24 收藏 14.41MB PDF 举报

"《Pattern Classification》是Duda撰写的关于模式识别的第二版经典教程，是机器学习领域的重要参考书籍。书中涵盖了模式识别的基础概念、问题子集、特征提取、噪声处理、过拟合、模型选择等多个方面，并讨论了学习与适应的不同类型，如监督学习、无监督学习和强化学习。" 《Pattern Classification》第二版是机器学习和模式识别领域的基石，它深入探讨了人类如何通过感知、理解和判断来识别复杂模式。书中的内容分为多个部分，包括介绍、问题子集、学习与适应等章节。 1. 引言部分（Introduction）：首先，书中提出机器感知的概念，即如何通过机器模拟人类的感知能力。作者通过一个例子展示了模式识别在不同领域的应用，并列出了模式分类的子问题，这些问题包括特征提取、噪声处理、过拟合、模型选择等。 1.3 子问题详细解释： - 特征提取：识别过程中的关键步骤，涉及如何从原始数据中挑选出对分类有用的属性。 - 噪声：数据中的随机性和不准确性，需要处理以减少其对分类结果的影响。 - 过拟合：模型过于复杂，导致对训练数据过度适应，而对新数据预测能力下降。 - 模型选择：确定最佳模型的过程，通常涉及平衡模型复杂度和预测性能。 - 先验知识：利用已有的知识来指导分类，可以提高模型的准确性。 - 缺失特征：处理数据中缺失的部分，对分类算法的稳健性至关重要。 - 形态学（Mereology）：研究部分与整体关系的学科，在模式识别中可能涉及到物体的分割和组成。 - 分割：将图像或数据集划分为有意义的子区域。 - 上下文：考虑周围环境或上下文信息，可以改善分类决策。 - 不变性：设计对特定变换不变的特征，使识别更稳定。 - 证据聚合：将多个证据源的信息整合进行决策。 - 成本与风险：在分类决策中考虑错误的代价和可能性。 - 计算复杂性：评估算法的运行时间和资源需求。 1.4 学习与适应：这部分涵盖了不同的学习机制，如监督学习，其中模型通过已知标签的数据进行训练；无监督学习，数据没有标签，模型试图发现内在结构；以及强化学习，通过奖励和惩罚来调整行为策略。 1.5 结论：总结章节的主要内容，并为读者提供了进一步阅读的指引。这本书不仅适合初学者了解模式识别的基本概念，也对专业研究人员提供了深入的理论和实践见解，是机器学习和人工智能领域不可或缺的参考资料。

4 CHAPTER 1. INTRODUCTION

images — variations in lighting, position of the ﬁsh on the conveyor, even “static”

due to the electronics of the camera itself.

Given that there truly are diﬀerences between the population of sea bass and that

of salmon, we view them as having diﬀerent models — diﬀerent descriptions, whichmodel

are typically mathematical in form. The overarching goal and approach in pattern

classiﬁcation is to hypothesize the class of these models, process the sensed data

to eliminate noise (not due to the models), and for any sensed pattern choose the

model that corresponds best. Any techniques that further this aim should be in the

conceptual toolbox of the designer of pattern recognition systems.

Our prototype system to perform this very speciﬁc task might well have the form

shown in Fig. 1.1. First the camera captures an image of the ﬁsh. Next, the camera’s

signals are preprocessed to simplify subsequent operations without loosing relevantpre-

processing information. In particular, we might use a segmentation operation in which the images

segmentation

of diﬀerent ﬁsh are somehow isolated from one another and from the background. The

information from a single ﬁsh is then sent to a feature extractor, whose purpose is to

feature

extraction

reduce the data by measuring certain “features” or “properties.” These features

(or, more precisely, the values of these features) are then passed to a classiﬁer that

evaluates the evidence presented and makes a ﬁnal decision as to the species.

The preprocessor might automatically adjust for average light level, or threshold

the image to remove the background of the conveyor belt, and so forth. For the

moment let us pass over how the images of the ﬁsh might be segmented and consider

how the feature extractor and classiﬁer might be designed. Suppose somebody at the

ﬁsh plant tells us that a sea bass is generally longer than a salmon. These, then,

give us our tentative models for the ﬁsh: sea bass have some typical length, and this

is greater than that for salmon. Then length becomes an obvious feature, and we

might attempt to classify the ﬁsh merely by seeing whether or not the length l of

a ﬁsh exceeds some critical value l

∗

. To choose l

∗

we could obtain some design or

training samples of the diﬀerent types of ﬁsh, (somehow) make length measurements,training

samples and inspect the results.

Suppose that we do this, and obtain the histograms shown in Fig. 1.2. These

disappointing histograms bear out the statement that sea bass are somewhat longer

than salmon, on average, but it is clear that this single criterion is quite poor; no

matter how we choose l

∗

, we cannot reliably separate sea bass from salmon by length

alone.

Discouraged, but undeterred by these unpromising results, we try another feature

— the average lightness of the ﬁsh scales. Now we are very careful to eliminate

variations in illumination, since they can only obscure the models and corrupt our

new classiﬁer. The resulting histograms, shown in Fig. 1.3, are much more satisfactory

— the classes are much better separated.

So far we have tacitly assumed that the consequences of our actions are equally

costly: deciding the ﬁsh was a sea bass when in fact it was a salmon was just as

undesirable as the converse. Such a symmetry in the cost is often, but not invariablycost

the case. For instance, as a ﬁsh packing company we may know that our customers

easily accept occasional pieces of tasty salmon in their cans labeled “sea bass,” but

they object vigorously if a piece of sea bass appears in their cans labeled “salmon.”

If we want to stay in business, we should adjust our decision boundary to avoid

antagonizing our customers, even if it means that more salmon makes its way into

the cans of sea bass. In this case, then, we should move our decision boundary x

∗

smaller values of lightness, thereby reducing the number of sea bass that are classiﬁed

as salmon (Fig. 1.3). The more our customers object to getting sea bass with their

1.2. AN EXAMPLE 7

2 4 6 8 10

Width

Lightness

salmon sea bass

Figure 1.4: The two features of lightness and width for sea bass and salmon. The

dark line might serve as a decision boundary of our classiﬁer. Overall classiﬁcation

error on the data shown is lower than if we use only one feature as in Fig. 1.3, but

there will still be some errors.

of more than one feature at a time.

In our search for other features, we might try to capitalize on the observation that

sea bass are typically wider than salmon. Now we have two features for classifying

ﬁsh — the lightness x

and the width x

. If we ignore how these features might be

measured in practice, we realize that the feature extractor has thus reduced the image

of each ﬁsh to a point or feature vector x in a two-dimensional feature space, where

x =





Our problem now is to partition the feature space into two regions, where for all

patterns in one region we will call the ﬁsh a sea bass, and all points in the other we

call it a salmon. Suppose that we measure the feature vectors for our samples and

obtain the scattering of points shown in Fig. 1.4. This plot suggests the following rule

for separating the ﬁsh: Classify the ﬁsh as sea bass if its feature vector falls above the

decision boundary shown, and as salmon otherwise. decision

boundary

This rule appears to do a good job of separating our samples and suggests that

perhaps incorporating yet more features would be desirable. Besides the lightness

and width of the ﬁsh, we might include some shape parameter, such as the vertex

angle of the dorsal ﬁn, or the placement of the eyes (as expressed as a proportion of

the mouth-to-tail distance), and so on. How do we know beforehand which of these

features will work best? Some features might be redundant: for instance if the eye

color of all ﬁsh correlated perfectly with width, then classiﬁcation performance need

not be improved if we also include eye color as a feature. Even if the diﬃculty or

computational cost in attaining more features is of no concern, might we ever have

too many features?

Suppose that other features are too expensive or expensive to measure, or provide

little improvement (or possibly even degrade the performance) in the approach de-

scribed above, and that we are forced to make our decision based on the two features

in Fig. 1.4. If our models were extremely complicated, our classiﬁer would have a

decision boundary more complex than the simple straight line. In that case all the

8 CHAPTER 1. INTRODUCTION

2 4 6 8 10

Width

Lightness

salmon sea bass

Figure 1.5: Overly complex models for the ﬁsh will lead to decision boundaries that are

complicated. While such a decision may lead to perfect classiﬁcation of our training

samples, it would lead to poor performance on future patterns. The novel test point

marked ? is evidently most likely a salmon, whereas the complex decision boundary

shown leads it to be misclassiﬁed as a sea bass.

training patterns would be separated perfectly, as shown in Fig. 1.5. With such a

“solution,” though, our satisfaction would be premature because the central aim of

designing a classiﬁer is to suggest actions when presented with novel patterns, i.e.,

ﬁsh not yet seen. This is the issue of generalization. It is unlikely that the complexgeneral-

ization decision boundary in Fig. 1.5 would provide good generalization, since it seems to be

“tuned” to the particular training samples, rather than some underlying characteris-

tics or true model of all the sea bass and salmon that will have to be separated.

Naturally, one approach would be to get more training samples for obtaining a

better estimate of the true underlying characteristics, for instance the probability

distributions of the categories. In most pattern recognition problems, however, the

amount of such data we can obtain easily is often quite limited. Even with a vast

amount of training data in a continuous feature space though, if we followed the

approach in Fig. 1.5 our classiﬁer would give a horrendously complicated decision

boundary — one that would be unlikely to do well on novel patterns.

Rather, then, we might seek to “simplify” the recognizer, motivated by a belief

that the underlying models will not require a decision boundary that is as complex as

that in Fig. 1.5. Indeed, we might be satisﬁed with the slightly poorer performance

on the training samples if it means that our classiﬁer will have better performance

on novel patterns.

∗

But if designing a very complex recognizer is unlikely to give

good generalization, precisely how should we quantify and favor simpler classiﬁers?

How would our system automatically determine that the simple curve in Fig. 1.6

is preferable to the manifestly simpler straight line in Fig. 1.4 or the complicated

boundary in Fig. 1.5? Assuming that we somehow manage to optimize this tradeoﬀ,

can we then predict how well our system will generalize to new patterns? These are

some of the central problems in statistical pattern recognition.

For the same incoming patterns, we might need to use a drastically diﬀerent cost

∗

The philosophical underpinnings of this approach derive from William of Occam (1284-1347?), who

advocated favoring simpler explanations over those that are needlessly complicated — Entia non

sunt multiplicanda praeter necessitatem (“Entities are not to be multiplied without necessity”).

Decisions based on overly complex models often lead to lower accuracy of the classiﬁer.

剩余737页未读，继续阅读

染血的鲜花

粉丝: 0
资源: 1

模式识别经典教程：Duda的《Pattern Classification》第二版

Pattern Classification - Richard O Duda, Peter E Hart, David G Stork - 2Ed - Wiley-Interscience

pattern classification-2nd-duda hart stork（中文版）

模式分类 Richard O.Duda第二版【Pattern Classification 2nd edition】 (代码和手册).

duda hart Pattern Recognition textbook 2nd ed

《模式分类（杜德Duda）》课后习题参考答案（英文）

模式分类答案 Duda：解决方案手册 2nd ed. 标题总结

"Pattern Classification 课后答案与习题解析

Simulink 通信系统仿真中模拟天线错位和阻塞的影响.rar

全球气候变化下EDPs政策建模与经济文化影响评估

领票小助手小程序.zip

最新资源