《模式分类》：深度探索机器学习基础与挑战

4星 · 超过85%的资源需积分: 10 99 浏览量更新于2024-07-22 1 收藏 14.42MB PDF 举报

《模式分类》第二版是一本经典的机器学习教材，该书深入探讨了人工智能领域中的核心概念和技术。本书第一章从引言开始，阐述了模式识别在日常生活中无处不在的重要性，如人脸识别、语音理解、手写字符识别等复杂任务背后所涉及的高深技术。作者首先介绍了机器感知（Machine Perception），即人类如何通过感官输入接收和解析环境中的模式。接着，通过一个实例来展开讨论，引导读者了解模式分类问题的基本框架，以及它与其他相关领域的交叉，如计算机视觉、自然语言处理等。书中详细探讨了模式分类中的几个关键子问题： 1. **特征提取**：这是模式识别的基础，涉及到如何从原始数据中抽取最有用的信息，以便于模型学习和决策。 2. **噪声处理**：现实世界中的数据往往含有噪声，有效的方法是设计抗干扰的特征和算法来减少噪声对分类的影响。 3. **过拟合**：防止模型过度适应训练数据，学习到的特征和规律只适用于特定情况，而不具有泛化能力。解决策略包括增加数据量、正则化和模型简化等。 4. **模型选择**：针对不同的任务和数据特性，选择合适的模型至关重要。这涉及到评估模型性能和对比不同算法的适用性。 5. **先验知识**：利用已有的科学理论或领域知识指导模型设计，可以提升识别效率和准确性。 6. **缺失特征**：处理数据中可能存在的缺失值，通过插补方法或利用其他特征进行推断。 7. **部分论（Mereology）**：处理不完整的信息，如何根据部分特征推断整体属性。 8. **分割与上下文**：在复杂场景中，如何将整体划分为更小的部分，以及上下文信息对模式识别的影响。 9. **不变性（Invariance）**：确保模型对于输入变化的鲁棒性，如尺寸、角度或光照条件的变化。 10. **证据融合（Evidence Pooling）**：如何整合多个观测结果，提高决策的可靠性。 11. **成本和风险**：在实际应用中，权衡模型准确性和执行速度，考虑误判和漏判的成本。 12. **计算复杂性**：理解和控制算法的资源消耗，尤其是在大数据和实时应用中。 13. **学习与适应**：书中分别讨论了监督学习（如决策树、支持向量机等）、无监督学习（聚类、降维）和强化学习（动态环境下的决策过程）的不同方法。最后，章节总结回顾了前面的内容，并附有参考文献和历史背景注释，帮助读者深入理解模式分类的历史发展和最新研究进展。索引提供了快速查找主题的便利，便于读者查阅具体概念或技术。《模式分类》第二版是一本全面而深入的指南，适合对机器学习有兴趣的读者，无论是初学者还是专业人员，都能从中获益匪浅。

4 CHAPTER 1. INTRODUCTION

images — variations in lighting, position of the ﬁsh on the conveyor, even “static”

due to the electronics of the camera itself.

Given that there truly are diﬀerences between the population of sea bass and that

of salmon, we view them as having diﬀerent models — diﬀerent descriptions, whichmodel

are typically mathematical in form. The overarching goal and approach in pattern

classiﬁcation is to hypothesize the class of these models, process the sensed data

to eliminate noise (not due to the models), and for any sensed pattern choose the

model that corresponds best. Any techniques that further this aim should be in the

conceptual toolbox of the designer of pattern recognition systems.

Our prototype system to perform this very speciﬁc task might well have the form

shown in Fig. 1.1. First the camera captures an image of the ﬁsh. Next, the camera’s

signals are preprocessed to simplify subsequent operations without loosing relevantpre-

processing information. In particular, we might use a segmentation operation in which the images

segmentation

of diﬀerent ﬁsh are somehow isolated from one another and from the background. The

information from a single ﬁsh is then sent to a feature extractor, whose purpose is to

feature

extraction

reduce the data by measuring certain “features” or “properties.” These features

(or, more precisely, the values of these features) are then passed to a classiﬁer that

evaluates the evidence presented and makes a ﬁnal decision as to the species.

The preprocessor might automatically adjust for average light level, or threshold

the image to remove the background of the conveyor belt, and so forth. For the

moment let us pass over how the images of the ﬁsh might be segmented and consider

how the feature extractor and classiﬁer might be designed. Suppose somebody at the

ﬁsh plant tells us that a sea bass is generally longer than a salmon. These, then,

give us our tentative models for the ﬁsh: sea bass have some typical length, and this

is greater than that for salmon. Then length becomes an obvious feature, and we

might attempt to classify the ﬁsh merely by seeing whether or not the length l of

a ﬁsh exceeds some critical value l

∗

. To choose l

∗

we could obtain some design or

training samples of the diﬀerent types of ﬁsh, (somehow) make length measurements,training

samples and inspect the results.

Suppose that we do this, and obtain the histograms shown in Fig. 1.2. These

disappointing histograms bear out the statement that sea bass are somewhat longer

than salmon, on average, but it is clear that this single criterion is quite poor; no

matter how we choose l

∗

, we cannot reliably separate sea bass from salmon by length

alone.

Discouraged, but undeterred by these unpromising results, we try another feature

— the average lightness of the ﬁsh scales. Now we are very careful to eliminate

variations in illumination, since they can only obscure the models and corrupt our

new classiﬁer. The resulting histograms, shown in Fig. 1.3, are much more satisfactory

— the classes are much better separated.

So far we have tacitly assumed that the consequences of our actions are equally

costly: deciding the ﬁsh was a sea bass when in fact it was a salmon was just as

undesirable as the converse. Such a symmetry in the cost is often, but not invariablycost

the case. For instance, as a ﬁsh packing company we may know that our customers

easily accept occasional pieces of tasty salmon in their cans labeled “sea bass,” but

they object vigorously if a piece of sea bass appears in their cans labeled “salmon.”

If we want to stay in business, we should adjust our decision boundary to avoid

antagonizing our customers, even if it means that more salmon makes its way into

the cans of sea bass. In this case, then, we should move our decision boundary x

∗

smaller values of lightness, thereby reducing the number of sea bass that are classiﬁed

as salmon (Fig. 1.3). The more our customers object to getting sea bass with their

1.2. AN EXAMPLE 7

2 4 6 8 10

Width

Lightness

salmon sea bass

Figure 1.4: The two features of lightness and width for sea bass and salmon. The

dark line might serve as a decision boundary of our classiﬁer. Overall classiﬁcation

error on the data shown is lower than if we use only one feature as in Fig. 1.3, but

there will still be some errors.

of more than one feature at a time.

In our search for other features, we might try to capitalize on the observation that

sea bass are typically wider than salmon. Now we have two features for classifying

ﬁsh — the lightness x

and the width x

. If we ignore how these features might be

measured in practice, we realize that the feature extractor has thus reduced the image

of each ﬁsh to a point or feature vector x in a two-dimensional feature space, where

x =





Our problem now is to partition the feature space into two regions, where for all

patterns in one region we will call the ﬁsh a sea bass, and all points in the other we

call it a salmon. Suppose that we measure the feature vectors for our samples and

obtain the scattering of points shown in Fig. 1.4. This plot suggests the following rule

for separating the ﬁsh: Classify the ﬁsh as sea bass if its feature vector falls above the

decision boundary shown, and as salmon otherwise. decision

boundary

This rule appears to do a good job of separating our samples and suggests that

perhaps incorporating yet more features would be desirable. Besides the lightness

and width of the ﬁsh, we might include some shape parameter, such as the vertex

angle of the dorsal ﬁn, or the placement of the eyes (as expressed as a proportion of

the mouth-to-tail distance), and so on. How do we know beforehand which of these

features will work best? Some features might be redundant: for instance if the eye

color of all ﬁsh correlated perfectly with width, then classiﬁcation performance need

not be improved if we also include eye color as a feature. Even if the diﬃculty or

computational cost in attaining more features is of no concern, might we ever have

too many features?

Suppose that other features are too expensive or expensive to measure, or provide

little improvement (or possibly even degrade the performance) in the approach de-

scribed above, and that we are forced to make our decision based on the two features

in Fig. 1.4. If our models were extremely complicated, our classiﬁer would have a

decision boundary more complex than the simple straight line. In that case all the

8 CHAPTER 1. INTRODUCTION

2 4 6 8 10

Width

Lightness

salmon sea bass

Figure 1.5: Overly complex models for the ﬁsh will lead to decision boundaries that are

complicated. While such a decision may lead to perfect classiﬁcation of our training

samples, it would lead to poor performance on future patterns. The novel test point

marked ? is evidently most likely a salmon, whereas the complex decision boundary

shown leads it to be misclassiﬁed as a sea bass.

training patterns would be separated perfectly, as shown in Fig. 1.5. With such a

“solution,” though, our satisfaction would be premature because the central aim of

designing a classiﬁer is to suggest actions when presented with novel patterns, i.e.,

ﬁsh not yet seen. This is the issue of generalization. It is unlikely that the complexgeneral-

ization decision boundary in Fig. 1.5 would provide good generalization, since it seems to be

“tuned” to the particular training samples, rather than some underlying characteris-

tics or true model of all the sea bass and salmon that will have to be separated.

Naturally, one approach would be to get more training samples for obtaining a

better estimate of the true underlying characteristics, for instance the probability

distributions of the categories. In most pattern recognition problems, however, the

amount of such data we can obtain easily is often quite limited. Even with a vast

amount of training data in a continuous feature space though, if we followed the

approach in Fig. 1.5 our classiﬁer would give a horrendously complicated decision

boundary — one that would be unlikely to do well on novel patterns.

Rather, then, we might seek to “simplify” the recognizer, motivated by a belief

that the underlying models will not require a decision boundary that is as complex as

that in Fig. 1.5. Indeed, we might be satisﬁed with the slightly poorer performance

on the training samples if it means that our classiﬁer will have better performance

on novel patterns.

∗

But if designing a very complex recognizer is unlikely to give

good generalization, precisely how should we quantify and favor simpler classiﬁers?

How would our system automatically determine that the simple curve in Fig. 1.6

is preferable to the manifestly simpler straight line in Fig. 1.4 or the complicated

boundary in Fig. 1.5? Assuming that we somehow manage to optimize this tradeoﬀ,

can we then predict how well our system will generalize to new patterns? These are

some of the central problems in statistical pattern recognition.

For the same incoming patterns, we might need to use a drastically diﬀerent cost

∗

The philosophical underpinnings of this approach derive from William of Occam (1284-1347?), who

advocated favoring simpler explanations over those that are needlessly complicated — Entia non

sunt multiplicanda praeter necessitatem (“Entities are not to be multiplied without necessity”).

Decisions based on overly complex models often lead to lower accuracy of the classiﬁer.

剩余737页未读，继续阅读

wbli2016

粉丝: 1
资源: 11

《模式分类》：深度探索机器学习基础与挑战

Pattern classification(2nd Edition) 模式分类

Pattern Classification 2nd edition

Pattern Classification 2nd Edition(模式分类 )

一个数据类型为'mediapipe.framework.formats.classification_pb2.ClassificationList'，怎么输出里面的变量

sklearn.datasets.make_classification

classification { index: 0 score: 0.9889504909515381 label: "Left" }是'mediapipe.framework.formats.classification_pb2.ClassificationList'数据类型，怎么只输出label

targets = torch.ones_like(classification) * -1 targets = targets.type_as(classification)

from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report怎么解释

X = data.loc[:, data.columns != 'classification'] y = data['classification']什么意思，解释一下

import torch import torch.nn as nn import torch.optim as optim from sklearn.datasets import make_classification

最新资源