模式分类：Richard O Duda的理论与应用

需积分: 45 179 浏览量更新于2024-07-20 收藏 14.41MB PDF 举报

"Pattern Classification" 是一本由Richard O Duda, Peter E Hart和David G Stork合著的书籍，第二版由Wiley-Interscience出版。这本书主要探讨了模式识别这一领域，涵盖了机器感知、特征提取、噪声处理、过拟合、模型选择、先验知识等多个子问题。在介绍部分，作者首先提到了机器感知，这是模式识别的基础，它涉及到如何让机器理解和解释输入的信息。接着，书中通过一个示例来说明模式识别的应用，并指出该领域与其他学科如人工智能、统计学、计算机视觉等的关联。在讨论模式分类的子问题时，书中强调了以下几个关键点： 1. 特征提取：这是模式识别的第一步，涉及从原始数据中选择或构建有助于区分不同类别的特征。 2. 噪声：噪声是影响识别准确性的因素，需要有方法来减小或过滤掉它的影响。 3. 过拟合：当模型过于复杂，对训练数据过度适应时，可能会导致在新数据上的性能下降。 4. 模型选择：选择合适的模型是关键，需要平衡模型的复杂度和泛化能力。 5. 先验知识：利用先验知识可以指导模型构建，提高分类效果。 6. 缺失特征：处理缺失特征的情况，需要有策略来填补这些空缺。 7. 形态学（Mereology）：研究对象的部分与整体关系，在图像分析和分割中有应用。 8. 分割：将图像或数据集分成有意义的区域或部分，以便进一步分析。 9. 上下文：考虑上下文信息可以帮助理解模式的含义，例如在自然语言处理中。 10. 不变性：设计不变性特征使系统能在不同条件下保持识别能力，如图像旋转不变性。 11. 证据整合：如何结合多个证据来源以做出更可靠的决策。 12. 成本和风险：识别过程中的错误可能有代价，需要权衡错误类型和代价。 13. 计算复杂性：识别算法应考虑计算效率，尤其是在大数据集上。书中还提到了学习和适应的概念，包括监督学习、无监督学习和强化学习这三种主要的学习方式，这些都是模式识别中实现自我改进和优化的重要途径。最后，作者总结了每章的主要内容，并提供了历史和文献方面的注解，以及参考书目，便于读者深入研究和扩展阅读。《Pattern Classification》是一本深入探讨模式识别理论和技术的著作，涵盖了从基础概念到高级主题的广泛内容，对于理解并实践模式识别具有极高的参考价值。

4 CHAPTER 1. INTRODUCTION

images — variations in lighting, position of the ﬁsh on the conveyor, even “static”

due to the electronics of the camera itself.

Given that there truly are diﬀerences between the population of sea bass and that

of salmon, we view them as having diﬀerent models — diﬀerent descriptions, whichmodel

are typically mathematical in form. The overarching goal and approach in pattern

classiﬁcation is to hypothesize the class of these models, process the sensed data

to eliminate noise (not due to the models), and for any sensed pattern choose the

model that corresponds best. Any techniques that further this aim should be in the

conceptual toolbox of the designer of pattern recognition systems.

Our prototype system to perform this very speciﬁc task might well have the form

shown in Fig. 1.1. First the camera captures an image of the ﬁsh. Next, the camera’s

signals are preprocessed to simplify subsequent operations without loosing relevantpre-

processing information. In particular, we might use a segmentation operation in which the images

segmentation

of diﬀerent ﬁsh are somehow isolated from one another and from the background. The

information from a single ﬁsh is then sent to a feature extractor, whose purpose is to

feature

extraction

reduce the data by measuring certain “features” or “properties.” These features

(or, more precisely, the values of these features) are then passed to a classiﬁer that

evaluates the evidence presented and makes a ﬁnal decision as to the species.

The preprocessor might automatically adjust for average light level, or threshold

the image to remove the background of the conveyor belt, and so forth. For the

moment let us pass over how the images of the ﬁsh might be segmented and consider

how the feature extractor and classiﬁer might be designed. Suppose somebody at the

ﬁsh plant tells us that a sea bass is generally longer than a salmon. These, then,

give us our tentative models for the ﬁsh: sea bass have some typical length, and this

is greater than that for salmon. Then length becomes an obvious feature, and we

might attempt to classify the ﬁsh merely by seeing whether or not the length l of

a ﬁsh exceeds some critical value l

∗

. To choose l

∗

we could obtain some design or

training samples of the diﬀerent types of ﬁsh, (somehow) make length measurements,training

samples and inspect the results.

Suppose that we do this, and obtain the histograms shown in Fig. 1.2. These

disappointing histograms bear out the statement that sea bass are somewhat longer

than salmon, on average, but it is clear that this single criterion is quite poor; no

matter how we choose l

∗

, we cannot reliably separate sea bass from salmon by length

alone.

Discouraged, but undeterred by these unpromising results, we try another feature

— the average lightness of the ﬁsh scales. Now we are very careful to eliminate

variations in illumination, since they can only obscure the models and corrupt our

new classiﬁer. The resulting histograms, shown in Fig. 1.3, are much more satisfactory

— the classes are much better separated.

So far we have tacitly assumed that the consequences of our actions are equally

costly: deciding the ﬁsh was a sea bass when in fact it was a salmon was just as

undesirable as the converse. Such a symmetry in the cost is often, but not invariablycost

the case. For instance, as a ﬁsh packing company we may know that our customers

easily accept occasional pieces of tasty salmon in their cans labeled “sea bass,” but

they object vigorously if a piece of sea bass appears in their cans labeled “salmon.”

If we want to stay in business, we should adjust our decision boundary to avoid

antagonizing our customers, even if it means that more salmon makes its way into

the cans of sea bass. In this case, then, we should move our decision boundary x

∗

smaller values of lightness, thereby reducing the number of sea bass that are classiﬁed

as salmon (Fig. 1.3). The more our customers object to getting sea bass with their

1.2. AN EXAMPLE 7

2 4 6 8 10

Width

Lightness

salmon sea bass

Figure 1.4: The two features of lightness and width for sea bass and salmon. The

dark line might serve as a decision boundary of our classiﬁer. Overall classiﬁcation

error on the data shown is lower than if we use only one feature as in Fig. 1.3, but

there will still be some errors.

of more than one feature at a time.

In our search for other features, we might try to capitalize on the observation that

sea bass are typically wider than salmon. Now we have two features for classifying

ﬁsh — the lightness x

and the width x

. If we ignore how these features might be

measured in practice, we realize that the feature extractor has thus reduced the image

of each ﬁsh to a point or feature vector x in a two-dimensional feature space, where

x =





Our problem now is to partition the feature space into two regions, where for all

patterns in one region we will call the ﬁsh a sea bass, and all points in the other we

call it a salmon. Suppose that we measure the feature vectors for our samples and

obtain the scattering of points shown in Fig. 1.4. This plot suggests the following rule

for separating the ﬁsh: Classify the ﬁsh as sea bass if its feature vector falls above the

decision boundary shown, and as salmon otherwise. decision

boundary

This rule appears to do a good job of separating our samples and suggests that

perhaps incorporating yet more features would be desirable. Besides the lightness

and width of the ﬁsh, we might include some shape parameter, such as the vertex

angle of the dorsal ﬁn, or the placement of the eyes (as expressed as a proportion of

the mouth-to-tail distance), and so on. How do we know beforehand which of these

features will work best? Some features might be redundant: for instance if the eye

color of all ﬁsh correlated perfectly with width, then classiﬁcation performance need

not be improved if we also include eye color as a feature. Even if the diﬃculty or

computational cost in attaining more features is of no concern, might we ever have

too many features?

Suppose that other features are too expensive or expensive to measure, or provide

little improvement (or possibly even degrade the performance) in the approach de-

scribed above, and that we are forced to make our decision based on the two features

in Fig. 1.4. If our models were extremely complicated, our classiﬁer would have a

decision boundary more complex than the simple straight line. In that case all the

8 CHAPTER 1. INTRODUCTION

2 4 6 8 10

Width

Lightness

salmon sea bass

Figure 1.5: Overly complex models for the ﬁsh will lead to decision boundaries that are

complicated. While such a decision may lead to perfect classiﬁcation of our training

samples, it would lead to poor performance on future patterns. The novel test point

marked ? is evidently most likely a salmon, whereas the complex decision boundary

shown leads it to be misclassiﬁed as a sea bass.

training patterns would be separated perfectly, as shown in Fig. 1.5. With such a

“solution,” though, our satisfaction would be premature because the central aim of

designing a classiﬁer is to suggest actions when presented with novel patterns, i.e.,

ﬁsh not yet seen. This is the issue of generalization. It is unlikely that the complexgeneral-

ization decision boundary in Fig. 1.5 would provide good generalization, since it seems to be

“tuned” to the particular training samples, rather than some underlying characteris-

tics or true model of all the sea bass and salmon that will have to be separated.

Naturally, one approach would be to get more training samples for obtaining a

better estimate of the true underlying characteristics, for instance the probability

distributions of the categories. In most pattern recognition problems, however, the

amount of such data we can obtain easily is often quite limited. Even with a vast

amount of training data in a continuous feature space though, if we followed the

approach in Fig. 1.5 our classiﬁer would give a horrendously complicated decision

boundary — one that would be unlikely to do well on novel patterns.

Rather, then, we might seek to “simplify” the recognizer, motivated by a belief

that the underlying models will not require a decision boundary that is as complex as

that in Fig. 1.5. Indeed, we might be satisﬁed with the slightly poorer performance

on the training samples if it means that our classiﬁer will have better performance

on novel patterns.

∗

But if designing a very complex recognizer is unlikely to give

good generalization, precisely how should we quantify and favor simpler classiﬁers?

How would our system automatically determine that the simple curve in Fig. 1.6

is preferable to the manifestly simpler straight line in Fig. 1.4 or the complicated

boundary in Fig. 1.5? Assuming that we somehow manage to optimize this tradeoﬀ,

can we then predict how well our system will generalize to new patterns? These are

some of the central problems in statistical pattern recognition.

For the same incoming patterns, we might need to use a drastically diﬀerent cost

∗

The philosophical underpinnings of this approach derive from William of Occam (1284-1347?), who

advocated favoring simpler explanations over those that are needlessly complicated — Entia non

sunt multiplicanda praeter necessitatem (“Entities are not to be multiplied without necessity”).

Decisions based on overly complex models often lead to lower accuracy of the classiﬁer.

剩余737页未读，继续阅读

sinat_34672834

粉丝: 0
资源: 4

模式分类：Richard O Duda的理论与应用

Pattern Classification - Richard O Duda, Peter E Hart, David G Stork - 2Ed - Wiley-Interscience

2001_Pattern Classification-Richard O Duda_Peter E Hart_David G Stork-2Ed-Wiley-Interscience

Duda-pattern classification 课后答案.pdf

模式分类Pattern Classification-英文版-第二版(Duda)

模式分类 Richard O.Duda第二版【Pattern Classification 2nd edition】 (代码和手册).

模式识别经典：Richard Duda的《Pattern Classification》第二版

pattern classification" by duda and hart

Duda模式分类Pattern Classification MATLAB 代码，大部分全

pattern classification

Pattern Classification 2ed

最新资源