深度残差网络：解决深度学习训练难题与ImageNet竞赛夺冠关键

需积分: 46 197 浏览量更新于2024-09-07 1 收藏 282KB PDF 举报

深度残差学习在图像识别中的突破性进展是《深度残差学习 for 图像识别》(Deep Residual Learning for Image Recognition)这篇论文的核心主题。该论文由Kaiming He、Xiangyu Zhang、Shaoqing Ren和Jian Sun四位研究者共同完成，他们在Microsoft Research团队中工作，他们的电子邮件地址为{kahe,v-xiangz,v-shren,jiansun}@microsoft.com。论文强调了深度学习网络相较于传统浅层网络的重要优势，特别是在处理深层次网络时面临的挑战。传统的深层网络训练困难，尤其是由于梯度消失问题（gradient vanishing），使得模型难以收敛。作者提出了一种创新的框架——深度残差学习（Deep Residual Learning），其核心思想是将每一层看作是对输入信号的残差函数的学习，而非独立的函数。这种设计允许网络更好地传递梯度，解决了深度增加带来的梯度消失问题，使得训练变得更加容易且效率更高。论文通过大量的实验证明了深度残差网络在深度上的优越性。例如，在ImageNet数据集上，他们构建的深度达到152层的残差网络（比VGG网络深8倍）虽然复杂度更低，但性能却显著提升。令人瞩目的是，这些深度残差网络的集成系统在ImageNet测试集上的错误率仅为3.57%，这一成绩让他们在2015年的ImageNet分类任务中获得了第一名。此外，作者还展示了在CIFAR-10数据集上进行的实验，分别测试了100层和1000层深度的残差网络，进一步证实了深度残差学习在各种深度下的有效性。深度在网络表示能力中扮演着关键角色，尤其是在视觉识别任务中，深度残差学习为解决这个难题提供了重要的解决方案。《深度残差学习 for 图像识别》这篇论文标志着深度学习领域的一个重要里程碑，它不仅解决了深度网络训练中的核心问题，而且极大地推动了图像识别领域的性能提升，为后来者的研究和发展奠定了坚实的基础。深度残差学习不仅在理论层面革新了我们对深度学习的理解，也在实际应用中产生了深远的影响。

Deep Residual Learning for Image Recognition

Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun

Microsoft Research

{kahe, v-xiangz, v-shren, jiansun}@microsoft.com

Abstract

Deeper neural networks are more difﬁcult to train. We

present a residual learning framework to ease the training

of networks that are substantially deeper than those used

previously. We explicitly reformulate the layers as learn-

ing residual functions with reference to the layer inputs, in-

stead of learning unreferenced functions. We provide com-

prehensive empirical evidence showing that these residual

networks are easier to optimize, and can gain accuracy from

considerably increased depth. On the ImageNet dataset we

evaluate residual nets with a depth of up to 152 layers—8×

deeper than VGG nets [40] but still having lower complex-

ity. An ensemble of these residual nets achieves 3.57% error

on the ImageNet test set. This result won the 1st place on the

ILSVRC 2015 classiﬁcation task. We also present analysis

on CIFAR-10 with 100 and 1000 layers.

The depth of representations is of central importance

for many visual recognition tasks. Solely due to our ex-

tremely deep representations, we obtain a 28% relative im-

provement on the COCO object detection dataset. Deep

residual nets are foundations of our submissions to ILSVRC

& COCO 2015 competitions

, where we also won the 1st

places on the tasks of ImageNet detection, ImageNet local-

ization, COCO detection, and COCO segmentation.

1. Introduction

Deep convolutional neural networks [22, 21] have led

to a series of breakthroughs for image classiﬁcation [21,

49, 39]. Deep networks naturally integrate low/mid/high-

level features [49] and classiﬁers in an end-to-end multi-

layer fashion, and the “levels” of features can be enriched

by the number of stacked layers (depth). Recent evidence

[40, 43] reveals that network depth is of crucial importance,

and the leading results [40, 43, 12, 16] on the challenging

ImageNet dataset [35] all exploit “very deep” [40] models,

with a depth of sixteen [40] to thirty [16]. Many other non-

trivial visual recognition tasks [7, 11, 6, 32, 27] have also

http://image-net.org/challenges/LSVRC/2015/ and

http://mscoco.org/dataset/#detections-challenge2015.

0 1 2 3 4 5 6

iter. (1e4)

training error (%)

0 1 2 3 4 5 6

iter. (1e4)

test error (%)

56-layer

20-layer

56-layer

20-layer

Figure 1. Training error (left) and test error (right) on CIFAR-10

with 20-layer and 56-layer “plain” networks. The deeper network

has higher training error, and thus test error. Similar phenomena

on ImageNet is presented in Fig. 4.

greatly beneﬁted from very deep models.

Driven by the signiﬁcance of depth, a question arises: Is

learning better networks as easy as stacking more layers?

An obstacle to answering this question was the notorious

problem of vanishing/exploding gradients [14, 1, 8], which

hamper convergence from the beginning. This problem,

however, has been largely addressed by normalized initial-

ization [23, 8, 36, 12] and intermediate normalization layers

[16], which enable networks with tens of layers to start con-

verging for stochastic gradient descent (SGD) with back-

propagation [22].

When deeper networks are able to start converging, a

degradation problem has been exposed: with the network

depth increasing, accuracy gets saturated (which might be

unsurprising) and then degrades rapidly. Unexpectedly,

such degradation is not caused by overﬁtting, and adding

more layers to a suitably deep model leads to higher train-

ing error, as reported in [10, 41] and thoroughly veriﬁed by

our experiments. Fig. 1 shows a typical example.

The degradation (of training accuracy) indicates that not

all systems are similarly easy to optimize. Let us consider a

shallower architecture and its deeper counterpart that adds

more layers onto it. There exists a solution by construction

to the deeper model: the added layers are identity mapping,

and the other layers are copied from the learned shallower

model. The existence of this constructed solution indicates

that a deeper model should produce no higher training error

than its shallower counterpart. But experiments show that

our current solvers on hand are unable to ﬁnd solutions that

2016 IEEE Conference on Computer Vision and Pattern Recognition

DOI 10.1109/CVPR.2016.90

770

下载后可阅读完整内容，剩余8页未读，立即下载

elitewangth

粉丝: 0
资源: 5

深度残差网络：解决深度学习训练难题与ImageNet竞赛夺冠关键

【7】Deep residual learning for image recognition.pdf

Deep Residual Learning for Image Recognition 深度残差网络论文笔记

Deep Residual Learning for Image Recognition

inet--ResNet--Deep Residual Learning for Image Recognition.pdf

Deep Residual Learning for Image Recognition 机翻

paper reading:Deep Residual Learning for Image Recognition

论文精读-Deep Residual Learning for Image Recognition

resnet ppt refer to Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition浅读与实现

deep residual learning for image recognition

最新资源