掌握理论，通向《理解机器学习算法》实战解密

需积分: 9 132 浏览量更新于2024-07-17 收藏 543KB PDF 举报

在《理解机器学习：从理论到算法》这本书中，作者Shai Shalev-Shwartz和Shai Ben-David提供了丰富的理论讲解与实践练习解决方案。本书深入探讨了机器学习的核心概念，本摘要将重点分析部分内容，特别是关于理论基础和算法设计的部分。首先，章节中的第一个练习涉及多变量多项式函数pS(x)，它被定义为对训练数据集S中的正样本(xi, yi)进行加权，目的是构造一个函数，使得对于所有yi=1的样本，pS(xi)=0；而对于其他所有x，pS(x)<0。这展示了如何利用训练数据来构建决策边界，区分正负样本。第二部分，讨论了期望风险（Expected Risk）的概念，即通过线性期望计算某个假设函数h在给定分布Dm下的损失。这里的LS(h)表示h在数据集上的错误率，公式表明了预期风险等于实际数据上的错误概率平均值，即L(D,f)(h)。这是评估模型性能的重要指标，它体现了模型泛化能力。在第十三章的练习(a)，作者指出一个称为A的算法对训练集中所有正样本都标记为正，因为假设了可实现性，且算法返回包含所有正样本的最紧致矩形。这样，所有负样本也被正确分类，因此A是一个经验风险最小化（ERM）策略。接下来，章节还引入了固定分布D下的理想矩形R⋆，并将其与实际算法返回的矩形R(S)及对应假设fb进行比较。通过R(S)和A(S)，我们可以理解算法是如何根据训练数据调整其决策规则，以及它在特定分布下可能的性能。这部分内容强调了机器学习理论的重要性，不仅在于理解如何构造决策函数，而且还在于理解这些函数如何在实际应用中优化模型，以最小化预测错误。《理解机器学习：从理论到算法》提供了一个由浅入深的学习路径，帮助读者从理论框架出发，逐步掌握各种机器学习算法的实施细节和优化方法。通过解决这些问题，读者可以加深对支持向量机、神经网络等核心机器学习算法的理解，并提升在实际项目中的问题解决能力。

by s. Pick an arbitrary subset E ⊆ X \ C of k − s elements,

and let h ∈ H

be the hypothesis which satisﬁes h(x

) = y

for

every x

∈ C, and h(x) = 1

[E]

for every x ∈ X \ C. We conclude

that C is shattered by H

. It follows that VCdim(H

) ≥

min{k, |X| − k}.

(b) We claim that VCdim(H

≤k

) = k. First, we show that VCdim(H

≤k

) ≤

k. Let C ⊆ X be a set of size k + 1. Then, there doesn’t exist

h ∈ H

≤k

which satisﬁes h(x) = 1 for all x ∈ C.

It’s left to show that VCdim(H

≤k

) ≥ k. Let C = {x

, . . . , x

} ⊆

X be a set with of size m ≤ k. Let (y

, . . . , y

) ∈ {0, 1}

a vector of labels. This labeling is obtained by some hypothesis

h ∈ H

≤k

which satisﬁes h(x

) = y

for every x

∈ C, and h(x) = 0

for every x ∈ X \C. We conclude that C is shattered by H

≤k

. It

follows that VCdim(H

≤k

) ≥ k.

3. We claim that VC-dimension of H

n-parity

is n. First, we note that

n-parity

| = 2

. Thus,

VCdim(H

n-parity

) ≤ log(|H

n-parity

|) = n .

We will conclude the tightness of this bound by showing that the

standard basis {e

}

j=1

is shattered by H

n-parity

. Given a vector of

labels (y

, . . . , y

) ∈ {0, 1}

, let J = {j ∈ [n] : y

= 1}. Then

) = y

for every j ∈ [n].

4. Let X = R

. We will demonstrate all the 4 combinations using hy-

pothesis classes deﬁned over X × {0, 1}. Remember that the empty

set is always considered to be shattered.

• (<, =): Let d ≥ 2 and consider the class H = {1

[kxk

≤r]

: r ≥ 0}

of concentric balls. The VC-dimension of this class is 1. To see

this, we ﬁrst observe that if x 6= (0, . . . , 0), then {x} is shattered.

Second, if kx

≤ kx

, then the labeling y

= 0, y

= 1 is

not obtained by any hypothesis in H. Let A = {e

, e

}, where

, e

are the ﬁrst two elements of the standard basis of R

. Then,

= {(0, 0), (1, 1)}, {B ⊆ A : H shatters B} = {∅, {e

}, {e

}},

and

i=0



|A|



= 3.

• (=, <): Let H be the class of axis-aligned rectangles in R

. We

have seen that the VC-dimension of H is 4. Let A = {x

, x

where x

= (0, 0), x

= (1, 0), x

= (2, 0). All the labelings except

(1, 0, 1) are obtained. Thus, |H

| = 7, |{B ⊆ A : H shatters B}| =

7, and

i=0



|A|



= 8.

• (<, <): Let d ≥ 3 and consider the class H = {signhw, xi : w ∈

}

of homogenous halfspaces (see Chapter 9). We will prove in

Theorem 9.2 that the VC-dimension of this class is d. However,

here we will only rely on the fact that VCdim(H) ≥ 3. This fact

follows by observing that the set {e

, e

} is shattered. Let A =

, x

}, where x

= e

, x

= e

, and x

= (1, 1, 0, . . . , 0).

Note that all the labelings except (1, 1, −1) and (−1, −1, 1) are

obtained. It follows that |H

| = 6, |{B ⊆ A : H shatters B}| =

7, and

i=0



|A|



= 8.

• (=, =): Let d = 1, and consider the class H = {1

[x≥t]

: t ∈ R}

of thresholds on the line. We have seen that every singleton

is shattered by H, and that every set of size at least 2 is not

shattered by H. Choose any ﬁnite set A ⊆ R. Then each of the

three terms in “Sauer’s inequality” equals |A| + 1.

5. Our proof is a straightforward generalization of the proof in the 2-

dimensional case.

Let us ﬁrst deﬁne the class formally. Given real numbers a

≤ b

, a

≤

, . . . , a

≤ b

, deﬁne the classiﬁer h

,...,a

)

by h

,...,a

)

, . . . , x

) =

i=1

∈[a

]]

. The class of all axis-aligned rectangles in R

is deﬁned

as H

rec

= {h

,...,a

)

: ∀i ∈ [d], a

≤ b

, }.

Consider the set {x

, . . . , x

}, where x

= e

if i ∈ [d], and x

= −e

i−d

if i > d. As in the 2-dimensional case, it’s not hard to see that it’s

shattered. Indeed, let (y

, . . . , y

) ∈ {0, 1}

. Choose a

= −2 if

i+d

= 1, and a

= 0 otherwise. Similarly, choose b

= 2 if y

= 1, and

= 0 otherwise. Then h

,...,a

) = y

for every i ∈ [2d]. We

just proved that VCdim(H

rec

) ≥ 2d.

Let C be a set of size at least 2d + 1. We ﬁnish our proof by showing

that C is not shattered. By the pigeonhole principle, there exists an

element x ∈ C, s.t. for every j ∈ [d], there exists x

∈ C with x

≤ x

and similarly there exists x

∈ C with x

≥ x

. Thus the labeling in

which x is negative, and the rest of the elements in C are positive can

not be obtained.

6. (a) Each hypothesis, besides the all-negative hypothesis, is deter-

mined by deciding for each variable x

, whether x

, ¯x

or none of

which appear in the corresponding conjunction. Thus, |H

con

| =

+ 1.

We adopt the convention sign(0) = 1.

剩余66页未读，继续阅读

Juliesand

粉丝: 0
资源: 1

掌握理论，通向《理解机器学习算法》实战解密

understanding machine learning theory-algorithms

understanding-machine-learning-theory-algorithms

Understanding Machine Learning - From Theory to Algorithms

machine learning for algorithm trading第二版pdf

machine learning algorithm

how are machine learning used in math proof?

Genetic Algorithms in Search, Optimization and Machine Learning

Automated Machine Learning: Methods, Systems, Challenges

[Machine Learning & Algorithm] 随机森林（Random Forest）

supervised learning

最新资源