GoogleNet深度学习：ILSVRC14竞赛的创新突破

需积分: 10 195 浏览量更新于2024-09-10 收藏 1.17MB PDF 举报

GoogleNet深度学习是2014年ImageNet大规模视觉识别挑战（ILSVRC14）中的关键突破，它由Christian Szegedy、Wei Liu、Yangqing Jia等来自Google和多个大学的研究者共同提出。这个深度卷积神经网络（Deep Convolutional Neural Network, DCNN）架构的创新之处在于其对计算资源的有效利用，即使在深度和宽度增加的同时，保持了固定的计算预算。主要的设计理念来源于Hebbian原则，这是一种基于神经元连接强度的学习理论，以及多尺度处理的直觉。GoogleNet的核心特征是所谓的“ inception”模块，这是一个精心设计的组件，它允许网络在不同层面上同时处理不同尺度和特征尺寸的信息。这种设计有效地提升了模型的性能和效率，使得深度增加而不牺牲精度。 GoogLeNet，作为GoogleNet的一个具体实现，深度达到了22层，是ILSVRC14比赛中的主力模型。它的成功在于能够整合多种特征提取路径，包括不同大小的滤波器（即卷积核），以及通过并行处理来捕捉图像的不同层面特征，如细节和整体结构。这种并行处理方式被称为“Inception Module”，它显著地提高了模型在图像分类和目标检测任务上的准确性和鲁棒性。 GoogLeNet的引入标志着深度学习技术在计算机视觉领域的重大飞跃，不仅因为它在ILSVRC14上取得了优异的成绩，而且其设计理念和结构对后续的深度学习模型发展产生了深远影响。许多现代深度学习架构，如ResNet和DenseNet，都受到了GoogleNet的启发，进一步推动了深度学习在图像识别、物体检测、语义分割等领域的广泛应用。GoogleNet代表了深度学习时代下，如何巧妙地处理计算资源和模型复杂度之间关系的典范，展示了深度学习模型在大型视觉数据集上实现高性能的潜力。

(a) Siberian husky (b) Eskimo dog

Figure 1: Two distinct classes from the 1000 classes of the ILSVRC 2014 classiﬁcation challenge.

and expensive, especially if expert human raters are necessary to distinguish between ﬁne-grained

visual categories like those in ImageNet (even in the 1000-class ILSVRC subset) as demonstrated

by Figure 1.

Another drawback of uniformly increased network size is the dramatically increased use of compu-

tational resources. For example, in a deep vision network, if two convolutional layers are chained,

any uniform increase in the number of their ﬁlters results in a quadratic increase of computation. If

the added capacity is used inefﬁciently (for example, if most weights end up to be close to zero),

then a lot of computation is wasted. Since in practice the computational budget is always ﬁnite, an

efﬁcient distribution of computing resources is preferred to an indiscriminate increase of size, even

when the main objective is to increase the quality of results.

The fundamental way of solving both issues would be by ultimately moving from fully connected

to sparsely connected architectures, even inside the convolutions. Besides mimicking biological

systems, this would also have the advantage of ﬁrmer theoretical underpinnings due to the ground-

breaking work of Arora et al. [2]. Their main result states that if the probability distribution of

the data-set is representable by a large, very sparse deep neural network, then the optimal network

topology can be constructed layer by layer by analyzing the correlation statistics of the activations

of the last layer and clustering neurons with highly correlated outputs. Although the strict math-

ematical proof requires very strong conditions, the fact that this statement resonates with the well

known Hebbian principle – neurons that ﬁre together, wire together – suggests that the underlying

idea is applicable even under less strict conditions, in practice.

On the downside, todays computing infrastructures are very inefﬁcient when it comes to numerical

calculation on non-uniform sparse data structures. Even if the number of arithmetic operations is

reduced by 100⇥, the overhead of lookups and cache misses is so dominant that switching to sparse

matrices would not pay off. The gap is widened even further by the use of steadily improving,

highly tuned, numerical libraries that allow for extremely fast dense matrix multiplication, exploit-

ing the minute details of the underlying CPU or GPU hardware [16, 9]. Also, non-uniform sparse

models require more sophisticated engineering and computing infrastructure. Most current vision

oriented machine learning systems utilize sparsity in the spatial domain just by the virtue of em-

ploying convolutions. However, convolutions are implemented as collections of dense connections

to the patches in the earlier layer. ConvNets have traditionally used random and sparse connection

tables in the feature dimensions since [11] in order to break the symmetry and improve learning, the

trend changed back to full connections with [9] in order to better optimize parallel computing. The

uniformity of the structure and a large number of ﬁlters and greater batch size allow for utilizing

efﬁcient dense computation.

This raises the question whether there is any hope for a next, intermediate step: an architecture

that makes use of the extra sparsity, even at ﬁlter level, as suggested by the theory, but exploits our

剩余11页未读，继续阅读

phecy__

粉丝: 1
资源: 1

GoogleNet深度学习：ILSVRC14竞赛的创新突破

googlenet model

matlab开发-谷歌网络的深入工具箱模型

deeplearning4j中除了MultiLayerNetwork 还有哪些模型

matlab中深度学习GoogleNet安装包

matlab googlenet安装包

matlab googlenet安装

matlabgooglenet安装

MATLAB可以实现CNN吗

matlab 有CNN吗

matlabCNN实现图像特征提取

最新资源