深度学习入门：斯坦福CS229机器学习讲义解析

需积分: 10 27 浏览量更新于2024-07-17 收藏 2.03MB PDF 举报

"这篇资源是斯坦福大学CS229课程的机器学习讲义，由Andrew Ng教授主讲。讲义主要关注有监督学习方法，包括线性回归、分类、逻辑回归、一般线性模型以及独立成分分析等内容，涵盖了广泛的机器学习理论与实践。由于涉及大量数学原理和公式，对学习者的数学水平有一定要求。" 在机器学习领域，有监督学习是一种常见的学习方式，它通过已有的带有标签的数据（即训练集）来构建一个模型，以便对未来未知数据进行预测。在这个过程中，数据被分为输入变量（也称为特征）和输出变量（或目标变量）。在给定的例子中，输入变量是房屋的居住面积（以平方英尺计），输出变量是房屋的价格（以千美元计）。训练集包含多对输入输出样本，如(x(i), y(i))，其中i代表训练集中的第i个样本。线性回归是解决回归问题的基本方法，它的目标是找到一条直线（或高维空间中的超平面）来最好地拟合这些数据点。在房屋价格预测问题中，线性回归模型可以表示为 y = wx + b，其中w是斜率，b是截距。通过最小化预测值与实际价格之间的均方误差，我们可以求解出最优的w和b。分类任务则涉及将数据分配到预定义的类别中。逻辑回归虽然名字中带有“回归”，但它实际上是一种二分类模型，常用于预测事件发生的概率。逻辑回归通过sigmoid函数将线性模型的输出转换为介于0和1之间的概率值。一般线性模型是线性回归的扩展，允许使用多项式特征，以处理非线性的关系。例如，如果居住面积与价格的关系不是线性的，我们可以通过添加居住面积的平方项或其他高次项来改进模型。独立成分分析（ICA）是一种信号处理技术，用于寻找原始信号的潜在源。在机器学习中，ICA可能用于识别数据中的隐藏因素，比如在多个传感器数据中分离出不同的信号源。这些理论和方法在实际应用中具有广泛的价值，如房地产市场分析、预测建模、模式识别等。然而，理解和掌握这些概念需要扎实的数学基础，包括线性代数、概率论和微积分等。在学习过程中，需要深入理解各个模型的原理，以及如何选择合适的模型来解决具体问题。同时，优化算法和正则化策略也是有监督学习中的关键点，它们可以帮助我们防止过拟合并提高模型的泛化能力。

Part II

Classiﬁcation and logistic

regression

Lets now talk about the classiﬁcation problem. This is just like the regression

problem, except that t he values y we now want to predict take on only

a small number of discrete values. For now, we will focus on the binary

classiﬁcation problem in which y can take on only two values, 0 and 1.

(Most of what we say here will also generalize to the multiple-class case.)

For instance, if we are trying to build a spam classiﬁer for email, then x

(i)

may be some features of a piece of email, and y may be 1 if it is a piece

of spam mail, and 0 otherwise. 0 is also called the negative class, and 1

the positive class, and they are sometimes also denoted by the symbols “-”

and “+.” Given x

(i)

, the corresponding y

(i)

is also called the label for the

training example.

5 Logistic regression

We could approach the classiﬁcation problem ignoring the fact t hat y is

discrete-valued, and use our old linear regression algorithm to try to predict

y given x. However, it is easy to construct examples where this method

performs very poorly. Intuitively, it also doesn’t make sense for h

(x) to take

values larger than 1 or smaller t han 0 when we know that y ∈ {0, 1}.

To ﬁx this, lets change the form for our hypotheses h

(x). We will choose

(x) = g(θ

x) =

1 + e

−θ

where

g(z) =

1 + e

−z

is called the logistic function or the sigmoid function. Here is a plot

showing g(z):

7 Another algorithm for maximizing ℓ(θ)

Returning to logistic regression with g(z) being the sigmoid function, lets

now talk about a diﬀerent algorithm for minimizing ℓ(θ).

To get us started, lets consider Newton’s method for ﬁnding a zero of a

function. Speciﬁcally, suppose we have some function f : R 7→ R, and we

wish to ﬁnd a value of θ so that f(θ) = 0. Here, θ ∈ R is a real number.

Newton’s method performs the following update:

θ := θ −

f(θ)

′

(θ)

This method has a natural interpretation in which we can think of it as

approximating the function f via a linear function that is tangent to f at

the current guess θ, solving for where th at linear function equals to zero, and

letting the next guess for θ be where that linear function is zero.

Here’s a picture of the Newton’s method in action:

1 1.5 2 2.5 3 3.5 4 4.5 5

−10

f(x)

1 1.5 2 2.5 3 3.5 4 4.5 5

−10

f(x)

1 1.5 2 2.5 3 3.5 4 4.5 5

−10

f(x)

In the leftmost ﬁgure, we see the function f plotted along with the line

y = 0. We’re trying to ﬁnd θ so that f(θ) = 0; the value of θ that achieves this

is about 1.3. Suppose we initialized the algorithm with θ = 4.5. Newton’s

method then ﬁts a straight line tangent to f at θ = 4.5, and solves for the

where that line evaluates to 0. (Middle ﬁgure.) Th is give us the next guess

for θ, which is about 2.8. The rightmost ﬁgure shows the result of running

one more iteration, which the updates θ to about 1.8. After a few more

iterations, we rapidly approach θ = 1.3.

Newton’s method gives a way of getting to f(θ) = 0. What if we want to

use it to maximize some function ℓ? The maxima of ℓ correspond to points

where its ﬁrst derivative ℓ

′

(θ) is zero. So, by letting f(θ) = ℓ

′

(θ), we can u se

the same algorithm to maximize ℓ, and we obtain update rule:

θ := θ −

ℓ

′

(θ)

ℓ

′′

(θ)

(Something to think about: How would this change if we wanted to use

Newton’s method to minimize rather than maximize a function?)

剩余133页未读，继续阅读

haminy

粉丝: 1
资源: 6

深度学习入门：斯坦福CS229机器学习讲义解析

2022-吴恩达机器学习课程(原始讲义)高清完整版PPTpdf

斯坦福大学公开课：机器学习的讲义

斯坦福机器学习公开课讲义+个人笔记

吴恩达斯坦福机器学习讲义

andrew ng等 斯坦福机器学习讲义

斯坦福机器学习讲义convex optimization2

斯坦福机器学习讲义-笔记下

斯坦福机器学习讲义(全)

斯坦福机器学习讲义+习题+答案

网易公开课斯坦福机器学习讲义中文版

最新资源

andrew ng等斯坦福机器学习讲义