深度卷积神经网络在ImageNet竞赛中的1.2M高分辨率图像分类

需积分: 17 137 浏览量更新于2024-09-07 收藏 1.35MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

本文档深入探讨了使用深度卷积神经网络（Deep Convolutional Neural Networks, DCNN）在ImageNet LSVRC-2010竞赛中进行大规模图像分类的方法。作者是Alex Krizhevsky、Ilya Sutskever和Geoffrey Hinton，他们来自加拿大多伦多大学。ImageNet是一个大型视觉识别挑战，包含1.2百万高分辨率图像，分为1000个类别。研究的核心成果是设计了一种深度神经网络模型，它包含五个卷积层，其中一些后面连接了最大池化层，以处理图像特征的局部和空间不变性。这五个卷积层有助于提取图像的低级特征，如边缘和纹理，而后续的三个全连接层则用于整合这些特征并进行最终的分类决策。网络共有6000万个参数和65万个神经元，这在当时是相当大的规模，展示了深度学习在复杂任务中的潜力。为了加速训练过程，作者采用了非饱和神经元，这种设计允许网络在更宽的输入范围内保持有效的梯度传播，提高了学习效率。同时，他们优化了GPU上的卷积运算实现，以提升计算速度。在全连接层，他们引入了名为"dropout"的正则化策略，这是一种随机失活技术，通过在每次训练迭代时随机关闭一部分神经元，防止过拟合，显著提升了模型的泛化能力。参赛结果显示，该模型在测试数据上的Top-1错误率为37.5%，Top-5错误率为17.0%，这相对于当时最先进的技术有了显著的提升。这一成就不仅验证了深度学习在大规模图像分类任务中的优越性能，也为后来的计算机视觉研究奠定了基础。此外，他们还在ImageNet LSVRC-2012竞赛中提交了该模型的一个变体，并赢得了比赛，进一步证明了其在实际场景中的竞争力。这篇文章是深度学习领域的一篇里程碑式论文，展示了深度卷积神经网络在解决大规模视觉识别问题上的突破，并对后来的机器学习和人工智能研究产生了深远影响。

资源详情

资源推荐

ImageNet Classiﬁcation with Deep Convolutional

Neural Networks

Alex Krizhevsky

University of Toronto

kriz@cs.utoronto.ca

Ilya Sutskever

University of Toronto

ilya@cs.utoronto.ca

Geoffrey E. Hinton

University of Toronto

hinton@cs.utoronto.ca

Abstract

We trained a large, deep convolutional neural network to classify the 1.2 million

high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-

ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%

and 17.0% which is considerably better than the previous state-of-the-art. The

neural network, which has 60 million parameters and 650,000 neurons, consists

of ﬁve convolutional layers, some of which are followed by max-pooling layers,

and three fully-connected layers with a ﬁnal 1000-way softmax. To make train-

ing faster, we used non-saturating neurons and a very efﬁcient GPU implemen-

tation of the convolution operation. To reduce overﬁtting in the fully-connected

layers we employed a recently-developed regularization method called “dropout”

that proved to be very effective. We also entered a variant of this model in the

ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,

compared to 26.2% achieved by the second-best entry.

1 Introduction

Current approaches to object recognition make essential use of machine learning methods. To im-

prove their performance, we can collect larger datasets, learn more powerful models, and use bet-

ter techniques for preventing overﬁtting. Until recently, datasets of labeled images were relatively

small — on the order of tens of thousands of images (e.g., NORB [16], Caltech-101/256 [8, 9], and

CIFAR-10/100 [12]). Simple recognition tasks can be solved quite well with datasets of this size,

especially if they are augmented with label-preserving transformations. For example, the current-

best error rate on the MNIST digit-recognition task (<0.3%) approaches human performance [4].

But objects in realistic settings exhibit considerable variability, so to learn to recognize them it is

necessary to use much larger training sets. And indeed, the shortcomings of small image datasets

have been widely recognized (e.g., Pinto et al. [21]), but it has only recently become possible to col-

lect labeled datasets with millions of images. The new larger datasets include LabelMe [23], which

consists of hundreds of thousands of fully-segmented images, and ImageNet [6], which consists of

over 15 million labeled high-resolution images in over 22,000 categories.

To learn about thousands of objects from millions of images, we need a model with a large learning

capacity. However, the immense complexity of the object recognition task means that this prob-

lem cannot be speciﬁed even by a dataset as large as ImageNet, so our model should also have lots

of prior knowledge to compensate for all the data we don’t have. Convolutional neural networks

(CNNs) constitute one such class of models [16, 11, 13, 18, 15, 22, 26]. Their capacity can be con-

trolled by varying their depth and breadth, and they also make strong and mostly correct assumptions

about the nature of images (namely, stationarity of statistics and locality of pixel dependencies).

Thus, compared to standard feedforward neural networks with similarly-sized layers, CNNs have

much fewer connections and parameters and so they are easier to train, while their theoretically-best

performance is likely to be only slightly worse.

下载后可阅读完整内容，剩余8页未读，立即下载

xinghaoyan

粉丝: 11
资源: 79

深度卷积神经网络在ImageNet竞赛中的1.2M高分辨率图像分类

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet classification with deep convolutional neural networks中文翻译

ImageNet_Classification_with_Deep_Convolutional_Neural_Networks

引用ImageNet classification with deep convolutional neural networks的格式

如何理解《ImageNet Classification with Deep Convolutional Neural Networks》中的Local response normalization？

imagenet classification with deep convolutional neural networks

2012 imagenet classification with deep convolutional neural networks

人工智能深度学习参考文献

参考文献卷积神经网络

动手学深度学习参考文献的格式

图像识别有关的文献推荐

请给我找几篇有关神经网络的文献

卷积神经网络图像分类的参考文献

全连接神经网络的相关参考文献

神经网络和卷积神经网络相关参考文献

关于基于深度网络的大豆灰斑病检测与分级系统的英文文献

为卷积神经网络的研究现状写一些参考文献

交叉熵损失函数来自那篇文献

推荐几个一维卷积神经网络相关的文献

基于神经网络的花卉识别系统设计与实现任务书

最新资源