cs230讲义-super-cheatsheet-deep-learning

深度学习

需积分: 10 56 浏览量更新于2023-03-16 评论收藏 5.28MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

CS 230 – Deep Learning Shervine Amidi & Afshine Amidi

Super VIP Cheatsheet: Deep Learning

Afshine Amidi and Shervine Amidi

November 25, 2018

Contents

1 Convolutional Neural Networks 2

1.1 Overview ................................. 2

1.2 Types of layer .............................. 2

1.3 Filter hyperparameters .......................... 2

1.4 Tuning hyperparameters ......................... 3

1.5 Commonly used activation functions ................... 3

1.6 Object detection ............................. 4

1.6.1 Face veriﬁcation and recognition ................. 5

1.6.2 Neural style transfer ....................... 5

1.6.3 Architectures using computational tricks ............ 6

2 Recurrent Neural Networks 7

2.1 Overview ................................. 7

2.2 Handling long term dependencies .................... 8

2.3 Learning word representation ...................... 9

2.3.1 Motivation and notations . . . . . . . . . . . . . . . . . . . 9

2.3.2 Word embeddings . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Comparing words . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Language model ............................. 10

2.6 Machine translation ........................... 10

2.7 Attention ................................. 10

3 Deep Learning Tips and Tricks 11

3.1 Data processing ............................. 11

3.2 Training a neural network ........................ 12

3.2.1 Deﬁnitions ............................ 12

3.2.2 Finding optimal weights ..................... 12

3.3 Parameter tuning ............................ 12

3.3.1 Weights initialization ...................... 12

3.3.2 Optimizing convergence ..................... 12

3.4 Regularization .............................. 13

3.5 Go od practices .............................. 13

1 Convolutional Neural Networks

1.1 Overview

r Architecture of a traditional CNN –Convolutionalneuralnetworks,alsoknownasCNNs,

are a speciﬁc type of neural networks that are generally composed of the following layers:

The convolution layer and the pooling layer can b e ﬁne-tuned with respect to hyperparameters

that are described in the next sections.

1.2 Types of layer

r Convolutional layer (CONV) –Theconvolutionlayer(CONV)usesﬁltersthatperform

convolution op erations as it is scanning the input I with respect to its dimensions. Its hyperpa-

rameters include the ﬁlter size F and stride S.TheresultingoutputO is called feature map or

activation map.

Remark: the convolution step can be generalized to the 1D and 3D cases as well.

r Pooling (POOL) –Thepoolinglayer(POOL)isadownsamplingoperation,typicallyapplied

after a convolution layer, which does some spatial invariance. In particular, max and average

pooling are special kinds of pooling where the maximum and average value is taken, respectively.

Stanford University 1 Winter 2019

CS 230 – Deep Learning Shervine Amidi & Afshine Amidi

Max pooling Ave r a g e p o o l i n g

Purp ose

Each pooling operation sel e cts the

maximum value of the current view

Each pooling operation averages

the values of the current view

Illustration

Comments

-Preservesdetectedfeatures

-Mostcommonlyused

-Downsamplesfeaturemap

-UsedinLeNet

r Fully Con n e c t e d (FC) –Thefullyconnectedlayer(FC)operatesonaﬂattenedinputwhere

each input is connected to all neurons. If present, FC layers are usually found towards the end

of CNN architectures and can be used to optimize objectives such as class scores.

1.3 Filter hyperparameters

The convolution layer contains ﬁlters for which it is important to know the meaning behind its

hyperparameters.

r Dimensions of a ﬁlt er –AﬁlterofsizeF ◊F applied to an input containing C channels is

a F ◊ F ◊ C volume that performs convolutions on an input of size I ◊ I ◊ C and produces an

output feature map (also called activation map) of size O ◊ O ◊ 1.

Remark: the application of K ﬁlters of size F ◊ F results in an output feature map of size

O ◊ O ◊ K.

r Stride –Foraconvolutionalorapoolingoperation,thestrideS denotes the number of pixels

by which the window moves after each operation.

r Zero-padding –Zero-paddingdenotestheprocessofaddingP zeroes to each side of the

boundaries of the input. This value can either be manually speciﬁed or automatically set through

one of the three modes detailed below:

Valid Same Full

Value

P =0

start

SÁ

Ë≠I+F ≠S

end

SÁ

Ë≠I+F ≠S

start

œ [[ 0 ,F ≠ 1]]

end

= F ≠ 1

Illustration

Purp ose

-Nopadding

-Dropslast

convolution if

dimensions do not

match

-Paddingsuchthatfeature

map size has size

-Outputsizeis

mathematically convenient

-Alsocalled’half’padding

-Maximumpadding

such that end

convolutions are

applied on the limits

of the input

-Filter’sees’theinput

end-to-end

1.4 Tuning hyperpa rameters

r Parameter compatibility in convolution layer –BynotingI the length of the input

volume size, F the length of the ﬁlter, P the amount of zero padding, S the stride, then the

output size O of the feature map along that dimension is given by:

O =

I ≠ F + P

start

+ P

end

Remark: often times, P

start

= P

end

, P ,inwhichcasewecanreplaceP

start

+ P

end

by 2P in

the formula above.

Stanford University 2 Winter 2019

CS 230 – Deep Learning Shervine Amidi & Afshine Amidi

r Understanding the complexity of the model –Inordertoassessthecomplexityofa

model, it is often useful to determine the number of parameters that its architecture will have.

In a given layer of a convolutional neural network, i t is done as follows:

CONV POOL FC

Illustration

Input size I ◊ I ◊ C I ◊ I ◊ C N

Output size O ◊ O ◊ K O ◊ O ◊ C N

out

Number of

parameters

(F ◊ F ◊ C +1)· K 0 (N

+1)◊ N

out

Remarks

-Onebiasparameter

per ﬁlter

-Inmostcases,S<F

-Acommonchoice

for K is 2C

-Poolingoperation

done channel-wise

-Inmostcases,S = F

-Inputisﬂattened

-Onebiasparameter

per neuron

-ThenumberofFC

neurons is free of

structural constraints

r Receptive ﬁeld –Thereceptiveﬁeldatlayerk is the area denoted R

◊ R

of the input

that each pixel of the k -th activation map can ’see’. By calling F

the ﬁlter size of layer j and

the stride value of layer i and with the convention S

=1,thereceptiveﬁeldatlayerk can

be computed with the formula:

=1+

j=1

≠ 1)

j≠1

i=0

In the example below, we have F

= F

=3and S

= S

=1,whichgivesR

=1+2· 1+2 · 1=

1.5 Commonly used activation functions

r Rectiﬁed Linear Unit –Therectiﬁedlinearunitlayer(ReLU)isanactivationfunctiong

that is used on all elements of the volume. It aims at introducing non-linearities to the network.

Its variants are summarized in the table below:

ReLU Leaky ReLU ELU

g(z)=max(0,z)

g(z)=max(‘z,z)

with ‘ π 1

g(z)=max(–(e

≠ 1),z)

with – π 1

Non-linearity complexities

biologically interpretable

Addresses dying ReLU

issue for negative values

Diﬀerentiable everywhere

r Softmax –Thesoftmaxstepcanbeseenasageneralizedlogisticfunctionthattakesasinput

avectorofscoresx œ R

and outputs a vector of output probability p œ R

through a softmax

function at the end of the architecture. It is deﬁned as follows:

p =

where p

j=1

1.6 Object detection

r Types of models –Thereare3maintypesofobjectrecognitionalgorithms,forwhichthe

nature of what is predicted is diﬀerent. They are described in the table below:

Image classiﬁcation

Classiﬁcation

w. localization

Detection

-Classiﬁesapicture

-Predictsprobability

of object

-Detectsobjectinapicture

-Predictsprobabilityof

object and where it is

located

-Detectsuptoseveralobjects

in a picture

-Predictsprobabilitiesofobjects

and where they are located

Traditional CNN

Simpliﬁed YOLO, R-CNN YOLO, R-CNN

r Detection –Inthecontextofobjectdetection,diﬀerentmethodsareuseddependingon

whether we just want to locate the object or detect a more complex shape in the image. The

two main ones are summed up in the table below:

Stanford University 3 Winter 2019

剩余12页未读，继续阅读

Duyuankai1992

粉丝: 107
资源: 71

会员权益专享

cs230讲义-super-cheatsheet-deep-learning

评论0

会员权益专享

最新资源

cs230讲义-super-cheatsheet-deep-learning

评论0

吴恩达深度学习CS230全部课件部分2

斯坦福大学-深度学习-cs230-DeepLearning-官方知识点总结PDF

machine learning 汇总-super-cheatsheet-machine-learning

scikit-learn算法cheat-sheet

vim-cheat-sheet

c语言 cheat sheet

sas cheatsheet

openflow协议_SDN 技术之 OpenFlow 流表 CheatSheet

atomichabits.com/cheatsheet

数据结构 cheatsheet

bash script cheatsheet

orin-cheat 账号

cheatengine-x86_64-sse4-avx2加强版

ubuntu安装cheat

github c语言学习

cheatengine-x86_64-sse4-avx2

github markdown图标 icon

会员权益专享

最新资源