深度学习：GPU实现的灵活高效卷积神经网络

需积分: 0 35 浏览量更新于2024-08-05 收藏 703KB PDF 举报

"这篇论文介绍了一种灵活且高性能的卷积神经网络（CNN）实现方法，用于图像分类。特征提取器通过监督学习的方式自动学习，而不是预先设计或固定。该深度层次架构在NORB、CIFAR10和MNIST等基准测试上取得了最佳的公开结果，分别达到2.53%、19.51%和0.35%的错误率。通过简单的反向传播训练的深层网络比较浅层网络表现更优，学习速度也非常快。" 这篇2011年的研究工作主要关注的是卷积神经网络（CNN）的设计和实现，尤其强调了其灵活性和高性能。CNN是一种在计算机视觉领域广泛使用的深度学习模型，模仿人脑视觉皮层的工作方式，特别适合处理图像数据。论文中提到的GPU实现是关键，因为GPU能够并行处理大量计算任务，大大加速了CNN的训练过程。作者提出了一种新的方法，其中的特征提取层不是由人工精心设计的，而是通过监督学习自动学习。这种方法的优势在于，它允许网络自我调整，以适应不同的图像分类任务，而不依赖于手动工程的特定滤波器。这种灵活性使得网络可以更广泛地应用，并且可能发现更高效的特征表示。实验结果显示，该模型在几个重要的图像分类任务上取得了突破性成果。NORB是一个物体识别数据集，包含了不同光照、角度和旋转下的小玩具物体，经过五个训练周期后，模型的测试错误率显著降低。MNIST是一个手写数字识别数据集，模型在仅一个训练周期后就能将错误率降至2.42%，随着训练的进行，错误率进一步下降到0.97%和0.48%。这些结果证明了深度学习在解决复杂视觉问题上的潜力，尤其是当网络深度增加时，性能提升更为明显。反向传播算法在这里起到了关键作用，它允许网络逐层更新权重以最小化损失函数，从而优化整体性能。尽管深度学习当时还相对较新，但该研究已经表明了深度网络的学习能力和效率，预示着未来在计算机视觉领域的广泛应用。此外，论文也强调了快速学习能力的重要性，这可能是由于深度网络层次结构的逐渐抽象和特征的逐层传递，使得模型能更快地理解图像模式。这一发现对后续的深度学习研究产生了深远影响，推动了更多高效、可扩展的深度网络架构的发展，如ResNet、VGG和Inception系列等。这篇论文为CNN的设计和训练提供了新的思路，不仅提升了模型性能，还简化了设计过程，使得CNN更加灵活且易于应用于各种图像识别任务。这些进展对于推动深度学习在图像处理和计算机视觉领域的广泛应用起到了关键作用。

Flexible, High Performance Convolutional

Neural Networks for Image Classiﬁcation

Dan C. Cires¸an, Ueli Meier, Jonathan Masci, Luca M. Gambardella, J

urgen Schmidhuber

IDSIA, USI and SUPSI

Galleria 2, 6928 Manno-Lugano, Switzerland

{dan,ueli,jonathan,luca,juergen}@idsia.ch

Abstract

We present a fast, fully parameterizable GPU im-

plementation of Convolutional Neural Network

variants. Our feature extractors are neither care-

fully designed nor pre-wired, but rather learned in

a supervised way. Our deep hierarchical architec-

tures achieve the best published results on bench-

marks for object classiﬁcation (NORB, CIFAR10)

and handwritten digit recognition (MNIST), with

error rates of 2.53%, 19.51%, 0.35%, respectively.

Deep nets trained by simple back-propagation per-

form better than more shallow ones. Learning is

surprisingly rapid. NORB is completely trained

within ﬁve epochs. Test error rates on MNIST

drop to 2.42%, 0.97% and 0.48% after 1, 3 and 17

epochs, respectively.

1 Introduction

The human visual system efﬁciently recognizes and local-

izes objects within cluttered scenes. For artiﬁcial systems,

however, this is still difﬁcult due to viewpoint-dependent ob-

ject variability, and the high in-class variability of many ob-

ject types. Deep hierarchical neural models roughly mimick

the nature of mammalian visual cortex, and by community

consensus are among the most promising architectures for

such tasks. The most successful hierarchical object recog-

nition systems all extract localized features from input im-

ages, convolving image patches with ﬁlters. Filter responses

are then repeatedly sub-sampled and re-ﬁltered, resulting in a

deep feed-forward network architecture whose output feature

vectors are eventually classiﬁed. One of the ﬁrst hierarchi-

cal neural systems was the Neocognitron

[

Fukushima, 1980

]

which inspired many of the more recent variants.

Unsupervised learning methods applied to patches of nat-

ural images tend to produce localized ﬁlters that resemble

off-center-on-surround ﬁlters, orientation-sensitive bar detec-

tors, Gabor ﬁlters

[

Schmidhuber et al., 1996; Olshausen and

Field, 1997; Hoyer and Hyv

arinen, 2000

]

. These ﬁndings

in conjunction with experimental studies of the visual cor-

tex justify the use of such ﬁlters in the so-called standard

model for object recognition

[

Riesenhuber and Poggio, 1999;

Serre et al., 2007; Mutch and Lowe, 2008

]

, whose ﬁlters are

ﬁxed, in contrast to those of Convolutional Neural Networks

(CNNs)

[

LeCun et al., 1998; Behnke, 2003; Simard et al.,

2003

]

, whose weights (ﬁlters) are randomly initialized and

changed in a supervised way using back-propagation (BP).

Despite the hardware progress of the past decades, compu-

tational speed is still a limiting factor for CNN architectures

characterized by many building blocks typically set by trial

and error. To systematically test the impact of various archi-

tectures on classiﬁcation performance, we present a fast CNN

implementation on Graphics Processing Units (GPUs). Previ-

ous GPU implementations of CNNs

[

Chellapilla et al., 2006;

Uetz and Behnke, 2009; Strigl et al., 2010

]

were hard-coded

to satisfy GPU hardware constraints or use general purpose

libraries, whereas our implementation is ﬂexible and fully on-

line (i.e., weight updates after each image). A notable excep-

tion is

[

Jarrett et al., 2009

]

who performed a thorough analy-

sis of the inﬂuence of all building blocks of a multistage ar-

chitecture on recognition performance. Our implementation

allows for training large CNNs within days instead of months,

such that we can investigate the inﬂuence of various structural

parameters by exploring large parameter spaces

[

Pinto et al.,

2009

]

and performing error analysis on repeated experiments.

We evaluate various networks on the handwritten digit

benchmark MNIST

[

LeCun et al., 1998

]

and two image clas-

siﬁcation benchmarks: NORB

[

LeCun et al., 2004

]

and CI-

FAR10

[

Krizhevsky, 2009

]

2 Convolutional neural networks

CNNs are hierarchical neural networks whose convolutional

layers alternate with subsampling layers, reminiscent of sim-

ple and complex cells in the primary visual cortex

[

Wiesel

and Hubel, 1959

]

. CNNs vary in how convolutional and sub-

sampling layers are realized and how the nets are trained.

2.1 Image processing layer

The image processing layer is an optional pre-processing

layer of predeﬁned ﬁlters that are kept ﬁxed during train-

ing. Thus additional information besides the raw input im-

age can be provided to the network, such as edges and gra-

dients. In particular, we ﬁnd that a contrast-extracting layer

[

Fukushima, 2003

]

helps to improve the recognition rate for

NORB.

1237

Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

下载后可阅读完整内容，剩余5页未读，立即下载

我只匆匆而过

粉丝: 20

深度学习：GPU实现的灵活高效卷积神经网络

Fast-Neural-Style-Transfer-master_style_深度学习项目_pytoch_图像风格转换_

Fine-tune_pretrained_Convolutional_Neural_Netwo

8-Bit Approximations for Parallelism in Deep Learning

Eyeriss v1 + v2 论文

【Code Practice】: Implementing GAN with TensorFlow_Keras: Beginners Can Also Get Started Easily

Exploring the Future of YOLOv8: Cutting-edge Considerations in Deep Learning Object Detection ...

Challenges and Solutions for Multi-Label Classification Problems: 5 Strategies to Help You Overcome ...

【In-Depth Analysis】: Comprehensive Interpretation of GAN Loss Functions: Practical Techniques for ...

RT-DETR aifc

cole_02_0507.pdf

最新资源