CS294ALecturenotesSparseautoencoder（稀疏自编码器课程讲义，吴恩达）_稀疏自编码器

需积分: 50 108 浏览量更新于2023-03-16 评论 4 收藏 414KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

CS294A Lecture notes

Andrew Ng

Sparse autoencoder

1 Introduction

Supervised learning is one of the most powerful to ols of AI, and has led to

automatic zip code recognition, speech recognition, self-driving cars, and a

continually improving understanding of the human genome. Despite its sig-

niﬁcant successes, supervised learning today is still severely limited. Speciﬁ-

cally, most applications of it still require that we manually specify the input

features x given to the algorithm. Once a good feature representation is

given, a supervised learning algorithm can do well. But in such domains as

computer vision, audio processing, and natural language processing, there’re

now hundreds or perhaps thousands of researchers who’ve sp ent years of their

lives slowly and laboriously hand-engineering vision, audio or text features.

While much of this feature-engineering work is extremely clever, one has to

wonder if we can do better. Certainly this labor-intensive hand-engineering

approach does not scale well to new problems; further, ideally we’d like to

have algorithms that can automatically le arn even be tter feature representa-

tions than the hand-engineered ones.

These notes describe the sparse autoencoder learning algorithm, which

is one approach to automatically learn features from unlabeled data. In some

domains, such as computer vision, this approach is not by itself competitive

with the best hand-engineered features, but the features it can learn do turn

out to be useful for a range of problems (including ones in audio, text, etc).

Further, there’re more sophisticated versions of the sparse auto encoder (not

described in these notes, but that you’ll hear more about later in the class)

that do surprisingly well, and in many cases are competitive with or superior

to even the best hand-engineered representations.

In this ﬁgure, we have used circles to also denote the inputs to the net-

work. The circles labeled “+1” are called bias units, and correspond to the

intercept term. The leftmost layer of the network is called the input layer,

and the rightmost layer the output layer (which, in this example, has only

one node). The middle layer of nodes is called the hidden layer, because

its values are not observed in the training set. We also say that our example

neural network has 3 input units (not counting the bias unit), 3 hidden

units, and 1 output unit.

We will let n

denote the number of layers in our network; thus n

= 3

in our example. We label layer l as L

, so layer L

is the input layer, and

layer L

the output layer. Our neural network has parameters (W, b) =

(1)

, b

(1)

, W

(2)

, b

(2)

), where we write W

(l)

to denote the parameter (or weight)

associated with the connection between unit j in layer l, and unit i in layer

l+1. (Note the order of the indices.) Also, b

(l)

is the bias associated with unit

i in layer l+1. Thus, in our example, we have W

(1)

∈ R

3×3

, and W

(2)

∈ R

1×3

Note that bias units don’t have inputs or connections going into them, since

they always output the value +1. We also let s

denote the number of nodes

in layer l (not counting the bias unit).

We will write a

(l)

to denote the activation (meaning output value) of

unit i in layer l. For l = 1, we also use a

(1)

= x

to denote the i-th input.

Given a ﬁxed setting of the parameters W, b, our neural ne twork deﬁnes a

hypothesis h

W,b

(x) that outputs a real number. Speciﬁcally, the computation

that this neural network represents is given by:

(2)

= f(W

(1)

+ W

(1)

+ W

(1)

+ b

(1)

) (2)

(2)

= f(W

(1)

+ W

(1)

+ W

(1)

+ b

(1)

) (3)

(2)

= f(W

(1)

+ W

(1)

+ W

(1)

+ b

(1)

) (4)

W,b

(x) = a

(3)

= f(W

(2)

+ W

(2)

+ W

(2)

+ b

(2)

) (5)

In the sequel, we also let z

(l)

denote the total weighted sum of inputs to unit

i in layer l, including the bias term (e.g., z

(2)

j=1

(1)

+ b

(1)

), so that

(l)

= f(z

(l)

Note that this easily lends itself to a more compact notation. Speciﬁcally,

if we extend the activation function f(·) to apply to vectors in an element-

wise fashion (i.e., f([z

, z

]) = [f(z

), f(z

)]), then we can write

剩余18页未读，继续阅读

物幻物移

粉丝: 0
资源: 9

会员权益专享

CS294A Lecture notes Sparse autoencoder （稀疏自编码器课程讲义，吴恩达）

评论0

会员权益专享

最新资源

CS294A Lecture notes Sparse autoencoder （稀疏自编码器课程讲义，吴恩达）

评论0

深度学习的 自编码器

吴恩达深度学习deeplearning所提论文

斯坦福大学cs299机器学习的数学基础笔记

四川大学 experiment of digital electronic technology lecture notes

1. 以二叉链表表示二叉树，建立一棵二叉树（算法5.3）；2. 输出二叉树的中序遍历结果（算法5.1或讲稿）；3. 输出二叉树的前序遍历结果（见讲稿）；4. 输出二叉树的后序遍历结果。

信号处理python书籍推荐

UPDATE course SET semester = lecture+experiment;,这个限制怎么去掉

python scipy教程

修改（或者重建）course表，使得semester属性的值等于lecture + experiment的值

$_FILES["lecture"]["name"]代码怎么应用

推荐一本计算几何最好的书籍

脚本语言python教程

an introduction to stochastic differential equations version微盘

springer-lncs有中文格式吗

会员权益专享

最新资源

深度学习的自编码器