深度学习：CNN架构分析与优化

需积分: 10 36 浏览量更新于2024-07-18 收藏 6.98MB PDF 举报

"CNN INTRODUCTION——分析与优化卷积神经网络架构硕士论文，作者Martin Thoma，卡尔斯鲁厄理工学院(FZI)计算机科学研究所，2017年8月" 这篇硕士论文深入探讨了卷积神经网络（CNN）的架构分析与优化，非常适合初学者。卷积神经网络是深度学习领域中的核心模型，广泛应用于图像识别、图像分类、物体检测等视觉任务。论文由Martin Thoma撰写，并得到了R. Dillmann教授和J. M. Zöllner教授的评审。在论文中，作者可能涵盖了以下关键知识点： 1. **卷积层的理解**：CNN的核心是卷积层，它通过滤波器（或称卷积核）对输入图像进行扫描，提取特征。论文可能详细解释了卷积层的工作原理，包括权值共享、局部连接和池化操作。 2. **激活函数**：激活函数如ReLU、Leaky ReLU、Sigmoid和Tanh在CNN中起到非线性转换的作用，增加了模型的表达能力。论文可能分析了不同激活函数的优缺点及适用场景。 3. **池化层**：池化层用于减小数据尺寸，降低计算复杂度，同时保持关键特征。常见的是最大池化和平均池化，论文可能会讨论它们的作用和影响。 4. **全连接层**：在卷积层之后，通常会接全连接层进行分类或回归。论文可能分析了全连接层的设计选择，如层数、神经元数量等。 5. **超参数优化**：论文可能会涉及如何调整学习率、批量大小、优化器选择（如SGD、Adam）、正则化策略（L1、L2）等超参数，以提升模型性能。 6. **模型深度与宽度**：增加网络深度可以提高模型的复杂度，捕捉更复杂的特征，但可能导致过拟合。宽度则是指每个层次的神经元数量。论文可能探讨了深度和宽度之间的平衡。 7. **残差连接**：ResNet等现代架构引入了残差连接，解决深层网络训练中的梯度消失问题。这可能是论文研究的一部分。 8. **数据增强**：为了防止过拟合并增加模型泛化能力，数据增强技术（如旋转、平移、缩放等）被广泛应用。论文可能讨论了数据增强的方法和效果。 9. **模型压缩与量化**：为了在资源有限的设备上部署CNN，模型压缩（如剪枝、量化）是重要手段。论文可能涉及这些技术及其对性能的影响。 10. **实验设计与评估**：论文可能包含了多个CNN架构的实验对比，使用标准数据集（如MNIST、CIFAR-10、ImageNet）进行验证，并用精度、召回率、F1分数等指标评估结果。通过阅读这篇论文，初学者将能系统地了解CNN的原理，以及如何通过架构调整来优化模型性能。此外，作者可能还提供了一些实用的建议和技巧，帮助读者在实际项目中应用这些理论知识。

1. Introduction

Despite the fact that most researchers and developers do not use topology learning, a couple

of algorithms have been proposed for this task. Five classes of topology learning algorithms

are introduced in Chapter 3.

When datasets and the number of classes are large, evaluating a single idea how to improve

the network can take several weeks just for the training. Hence the idea of building a

hierarchy of classiﬁers which allows to split the classiﬁcation task into various sub-tasks

that can easily be combined is evaluated in Chapter 4.

Confusion Matrix Ordering (CMO), the hierarchical classiﬁer, 9 types of hyperparameters

and label smoothing are evaluated in Chapter 5.

This work focuses on classiﬁcation problems to keep the presented ideas as pure and

simple as possible. The described techniques are relevant to all six described computer

vision problems due to the fact that Encoder-Decoder architectures are one component of

state-of-the-art algorithms for all six of them.

2. Convolutional Neural Networks

One important detail is how boundaries are treated. There are four common ways of

boundary treatment:

• don’t compute

: The image

will be smaller than the original image.

∈

(w−k

+1)×(h−k

+1)×d

, to be exact.

• zero padding

: The image

is padded by zeros where the ﬁlter would access elements

which do not exist. This will result in edges being detected at the border if the border

pixels are not black, but doesn’t need any computation.

• nearest: Repeat the pixel which is closest to the boundary.

• reﬂect: Reﬂect the image at the boundaries.

Common tasks that can be done with linear ﬁlters include edge detection, corner detection,

smoothing, sharpening, median ﬁltering, box ﬁltering. See Figure A.1 for ﬁve examples.

Please note that the result of a ﬁltering operation is again an image. This means ﬁlters

can be applied successively. While each pixel after one ﬁltering operation with a 3

ﬁlter got inﬂuenced by 3

3 = 9 pixels of the original image, two successively applied 3

ﬁlters increase the area of the original image which inﬂuenced the output. The output is

then inﬂuenced by 25 pixel. This is called the receptive ﬁeld. The kind of pattern which is

detected by a ﬁlter is called a feature. The bigger the receptive ﬁeld is, the more complex

can features get as they are able to consider more of the original image. Instead of taking

one 5

5 ﬁlter with 25 parameters, one might consider to take two successive 3

3 ﬁlters

with 2

3) = 18 parameters. The 5

5 ﬁlter is a strict superset of possible ﬁltering

operations compared to the two 3

3 ﬁlters, but the relevance of this technique will become

clear in Section 2.2.

2.2. CNN Layer Types

While the idea behind deep MLPs is that feature hierarchies capture the important parts

of the input more easily, CNNs are inspired by the idea of translational invariance: Many

features in an image are translationally invariant. For example, if a car is developed, one

could try to detect it by its parts [

FGMR10

]. But then there are many positions at which

the wheels could be. Combining those, it is desirable to capture low-level, translationally

invariant features at lower layers of an artiﬁcial neural network (ANN) and in higher layers

high-level features which are combinations of the low-level features.

Also, models should utilize the fact that the pixels of images are ordered. One way to use

this is by learning image ﬁlters in so called convolutional layers.

While MLPs vectorize the input, the input of a layer in a CNN are feature maps. A feature

map is a matrix

m ∈ R

w×h

, but typically the width equals the height (

). For an RGB

2.2. CNN Layer Types

input image, the number of feature maps is d = 3. Each color channel is a feature map.

Since AlexNet [

KSH12

] almost halved the error in the ImageNet challenge, CNNs are

state-of-the-art in various computer vision tasks.

Traditional CNNs have three important building tools:

•

Convolutional layers with a non-linear activation function as described in Section 2.2.1,

• pooling layers as described in Section 2.2.2 and

• normalization layers as described in Section 2.2.4.

2.2.1. Convolutional Layers

Convolutional layers take several feature maps as input and produce

feature maps

output, where

is the number of ﬁlters in the convolution layer. The ﬁlter weights of

the linear convolutions are the parameters which are adapted to the training data. The

number

of ﬁlters as well as the ﬁlter’s size

× k

are hyperparameters of convolutional

layers. Sometimes, it is denoted as

× k

. Although the ﬁlter depth is usually omitted

in the notation, the ﬁlters are of dimension

× k

× d

(i−1)

, where

(i−1)

is the number of

feature maps of the input layer (i − 1).

Another hyperparameter of convolution layers is the stride

s ∈ N

≥1

and the padding.

Padding (usually zero-padding [

SCL12

SEZ

HZRS15a

]) is used to make sure that the

size of the feature maps doesn’t change.

The hyperparameters of convolutional layers are

• the number of ﬁlters n ∈ N

≥1

• k

, k

∈ N

≥1

of the ﬁlter size k

× k

× d

(i−1)

• the activation function of the layer (see Table B.3) and

• the stride s ∈ N

≥1

Typical choices are

n ∈ { 32, 64, 128 }

k ∈ { 1, 3, 5, 11 }

such as in [

KSH12

SZ14, SLJ

15], rectiﬁed linear unit (ReLU) activation and s = 1.

The concept of weight sharing is crucial for CNNs. This concept was introduced in [

WHH

With weight sharing, the ﬁlters can be learned with stochastic gradient descent (SGD) just

like MLPs. In fact, every CNN has an equivalent MLP which computes the same function

if only the ﬂattened output is compared.

also called activation maps or channels

剩余133页未读，继续阅读

tracylhp

粉丝: 1
资源: 17

深度学习：CNN架构分析与优化

CNN入门介绍

CNN基础入门

CNN详细介绍

Introduction

ai_introduction

Introduction to Convolutional Networks

deep learning technical introduction

Introduction_To_Ai

Introduction to RNNs.pdf

Introduction to Deep Learning with Python

最新资源