支持向量机（SVM）深度解析：斯坦福大学CS229讲义

5星 · 超过95%的资源需积分: 15 192 浏览量更新于2024-07-23 5 收藏 332KB PDF 举报

“网易公开课提供了斯坦福大学的机器学习讲义，深入讲解了支持向量机（SVM）的学习算法，SVM被认为是最佳的监督学习算法之一。讲义内容包括 margins 概念、最优间隔分类器、拉格朗日对偶性、核函数以及SMO算法。” 在机器学习领域，支持向量机（SVM）是一种强大的监督学习模型，尤其适用于分类任务。SVM的核心概念是“margin”，它体现了模型的泛化能力和分类的确定性。当我们用一个超平面来分割数据时，margin是指数据点到超平面的距离。在描述SVM时，首先会提到margin的直觉：我们希望找到一个能将不同类别数据分开的超平面，同时最大化这个距离，因为这样可以使得模型对未知数据的预测更稳健。举个例子，考虑逻辑回归，其中概率p(y=1|x;θ)由线性函数hθ(x)=g(θTx)建模。当hθ(x)大于等于0.5时，我们预测y为1，即θTx大于等于0。在逻辑回归中，虽然我们能找到分离数据的超平面，但并不一定是最优的，因为它可能没有最大化margin。 SVM的目标是找到具有最大margin的超平面，这被称为最优间隔分类器。在此过程中，我们会遇到拉格朗日对偶性，这是优化问题中的一个重要概念。通过引入拉格朗日乘子，我们可以将原问题转化为一个更易求解的对偶问题，从而找到满足约束条件的最大margin解。 SVM的另一个关键创新是使用核函数。核函数允许我们在高维甚至无限维空间中进行有效的计算，即使原始数据可能在低维度上是非线性可分的。通过核函数，SVM能够处理复杂的数据结构，实现非线性分类。最后，SVM的高效实现通常依赖于序列最小优化（SMO）算法。SMO算法是一种解决SVM优化问题的有效方法，它通过迭代选择两个特定的样本来更新权重，确保每次迭代都能有效改善目标函数，并逐步逼近最优解。 SVM通过寻找最大margin的超平面，结合拉格朗日对偶性和核函数，成为了一种强大的分类工具，而SMO算法则保证了在大规模数据集上的计算效率。这些知识点在实际应用中有着广泛的价值，对于理解和支持向量机的使用至关重要。

ﬁnd that the point B is given by x

(i)

− γ

(i)

· w/||w||. But this point lies on

the decision boundary, and all points x on the decision boundary satisfy the

equation w

x + b = 0. Hence,



(i)

− γ

(i)

||w||



+ b = 0.

Solving for γ

(i)

yields

(i)

+ b

||w||



||w||



(i)

||w||

This was worked out for the case of a positive training example at A in the

ﬁgure, where being on the “positive” side of the decision boundary is good.

More generally, we deﬁne the geometric margin of (w, b) with respect to a

training example (x

(i)

, y

(i)

) to be

(i)

= y

(i)



||w||



(i)

||w||

Note that if ||w|| = 1, then the functional margin equals the geometric

margin—this thus gives us a way of relating these two diﬀerent notions of

margin. Also, the geometric margin is invariant to rescaling of the parame-

ters; i.e., if we replace w with 2w and b with 2b, then the geometric margin

does not change. This will in fact come in handy later. Speciﬁcally, because

of this invariance to the scaling of the parameters, when trying to ﬁt w and b

to training data, we can impose an arbitrary scaling constraint on w without

changing anything important; for instance, we can demand that ||w|| = 1, or

| = 5, or |w

+ b| + |w

| = 2, and any of these can be satisﬁed simply by

rescaling w and b.

Finally, given a training set S = {(x

(i)

, y

(i)

); i = 1, . . . , m}, we also deﬁne

the geometric margin of (w, b) with respect to S to be the smallest of the

geometric margins on the individual training examples:

γ = min

i=1,...,m

(i)

4 The optimal margin classiﬁer

Given a training set, it seems from our previous discussion that a natural

desideratum is to try to ﬁnd a decision boundary that maximizes the (ge-

ometric) margin, since this would reﬂect a very conﬁdent set of predictions

Notes3

剩余24页未读，继续阅读

ljming2001

粉丝: 1
资源: 3

支持向量机（SVM）深度解析：斯坦福大学CS229讲义

吴恩达CS229机器学习讲义：监督学习与房价预测

斯坦福CS229机器学习讲义解析：监督学习与模式识别

吴恩达机器学习课程讲义精选集

网易公开课 机器学习 讲义习题课件

网易公开课斯坦福机器学习讲义中文版

网易机器学习公开课讲义（自己制作了封面，目录）

吴恩达机器学习讲义

斯坦福大学C229机器学习讲义

机器学习 andrew 讲义

斯坦福机器学习课程讲义

最新资源

网易公开课机器学习讲义习题课件