多列深度神经网络在图像分类中的应用

版权申诉

87 浏览量更新于2024-08-12 收藏 1.62MB PDF 举报

"cvpr2012-Multi-column Deep Neural Networks for Image Classiﬁcation" 这篇论文探讨了在图像分类任务中使用多列深度神经网络（Multi-column Deep Neural Networks）的方法，作者Dan Ciresan、Ueli Meier和Jürgen Schmidhuber来自IDSIA-USI-SUPSI研究所。他们提出了一种生物启发式的宽而深的人工神经网络架构，旨在模拟哺乳动物视觉系统从视网膜到视觉皮层的多层次结构。传统的计算机视觉和机器学习方法在识别手写数字或交通标志等任务上无法与人类表现相媲美。该研究的核心在于通过小的（通常是极小的）感受野实现卷积神经网络的赢者通吃（winner-take-all）神经元，这导致了大量稀疏连接的神经层，类似于哺乳动物视觉系统的深度。只有获胜的神经元会被训练，这有助于减少计算复杂性和过拟合的风险。论文介绍了一个由多个深度神经列组成的架构，每个列都专注于处理预处理方式不同的输入。这些列各自成为特定处理方式的专家，它们的预测结果被平均，从而获得更稳健的输出。这种方法利用图形处理器（GPU）进行快速训练，极大地加速了模型的学习过程。在非常具有竞争力的MNIST手写数字识别基准测试中，该方法首次达到了接近人类的表现水平。在交通标志识别基准上，该模型的表现甚至超过了人类，实现了两倍的准确性提升。此外，它还在一系列常见的图像分类任务上提升了当时的 state-of-the-art。这篇工作的重要性在于它不仅推动了深度学习在图像识别领域的应用，还为后来的深度神经网络设计提供了重要的启示，例如多尺度特征提取、并行处理和高效训练策略。这种方法后来也被广泛应用于其他视觉识别任务，如物体检测、语义分割和图像生成，进一步推动了深度学习在人工智能领域的革命性发展。

Multi-column Deep Neural Networks for Image Classiﬁcation

Dan Cires¸an, Ueli Meier and J

urgen Schmidhuber

IDSIA-USI-SUPSI

Galleria 2, 6928 Manno-Lugano, Switzerland

{dan,ueli,juergen}@idsia.ch

Abstract

Traditional methods of computer vision and machine

learning cannot match human performance on tasks such

as the recognition of handwritten digits or trafﬁc signs. Our

biologically plausible, wide and deep artiﬁcial neural net-

work architectures can. Small (often minimal) receptive

ﬁelds of convolutional winner-take-all neurons yield large

network depth, resulting in roughly as many sparsely con-

nected neural layers as found in mammals between retina

and visual cortex. Only winner neurons are trained. Sev-

eral deep neural columns become experts on inputs pre-

processed in different ways; their predictions are averaged.

Graphics cards allow for fast training. On the very com-

petitive MNIST handwriting benchmark, our method is the

ﬁrst to achieve near-human performance. On a trafﬁc sign

recognition benchmark it outperforms humans by a factor

of two. We also improve the state-of-the-art on a plethora

of common image classiﬁcation benchmarks.

1. Introduction

Recent publications suggest that unsupervised pre-

training of deep, hierarchical neural networks improves su-

pervised pattern classiﬁcation [2, 10]. Here we train such

nets by simple online back-propagation, setting new, greatly

improved records on MNIST [19], Latin letters [13], Chi-

nese characters [22], trafﬁc signs [33], NORB (jittered, clut-

tered) [20] and CIFAR10 [17] benchmarks.

We focus on deep convolutional neural networks (DNN),

introduced by [11], improved by [19], reﬁned and simpli-

ﬁed by [1, 32, 7]. Lately, DNN proved their mettle on data

sets ranging from handwritten digits (MNIST) [5, 7], hand-

written characters [6] to 3D toys (NORB) and faces [34].

DNNs fully unfold their potential when they are wide (many

maps per layer) and deep (many layers) [7]. But training

them requires weeks, months, even years on CPUs. High

data transfer latency prevents multi-threading and multi-

CPU code from saving the situation. In recent years, how-

ever, fast parallel neural net code for graphics cards (GPUs)

has overcome this problem. Carefully designed GPU code

for image classiﬁcation can be up to two orders of magni-

tude faster than its CPU counterpart [35, 34]. Hence, to train

huge DNN in hours or days, we implement them on GPU,

building upon the work of [5, 7]. The training algorithm

is fully online, i.e. weight updates occur after each error

back-propagation step. We will show that properly trained

wide and deep DNNs can outperform all previous methods,

and demonstrate that unsupervised initialization/pretraining

is not necessary (although we don’t deny that it might help

sometimes, especially for datasets with few samples per

class). We also show how combining several DNN columns

into a Multi-column DNN (MCDNN) further decreases the

error rate by 30-40%.

2. Architecture

The initially random weights of the DNN are iteratively

trained to minimize the classiﬁcation error on a set of la-

beled training images; generalization performance is then

tested on a separate set of test images. Our architecture does

this by combining several techniques in a novel way:

(1) Unlike the small NN used in many applications,

which were either shallow [32] or had few maps per layer

(LeNet7, [20]), ours are deep and have hundreds of maps

per layer, inspired by the Neocognitron [11], with many

(6-10) layers of non-linear neurons stacked on top of each

other, comparable to the number of layers found between

retina and visual cortex of macaque monkeys [3].

(2) It was shown [14] that such multi-layered DNN are

hard to train by standard gradient descent [36, 18, 28], the

method of choice from a mathematical/algorithmic point

of view. Today’s computers, however, are fast enough for

this, more than 60000 times faster than those of the early

90s

. Carefully designed code for massively parallel graph-

ics processing units (GPUs normally used for video games)

allows for gaining an additional speedup factor of 50-100

over serial code for standard computers. Given enough la-

beled data, our networks do not need additional heuristics

1991 486DX-33 MHz, 2011 i7-990X 3.46 GHz

下载后可阅读完整内容，剩余7页未读，立即下载

应用市场

粉丝: 889
资源: 4166

多列深度神经网络在图像分类中的应用

MLP_CVPR2012-master.zip

DeepPose: Human Pose Estimation via Deep Neural Networks

找找最近两年ECCV ICCV CVPR中，关于transformer在遥感方面的论文

目前有哪些top-down方法的姿态估计网络，按年份梳理

【文献阅读】Perceptual Generative Adversarial Networks for Small Object Detection –CVPR-2017

近两年特征融合的文献

图像篡改检测国外研究现状

有三年内比较著名的文献吗

"Patch-Based Image Segmentation via Spatial-Aware Deep Embedding"，在CVPR 2018中发表，这篇似乎找不到

关于多特征服装检索的文献有哪些

最新资源