宽广残差网络：提升深度与效率的新突破

需积分: 16 5 浏览量更新于2024-07-16 收藏 393KB PDF 举报

本文主要探讨了深度残差网络（Deep Residual Networks, ResNets）在训练过程中面临的挑战，尤其是随着深度增加，性能提升的效率逐渐降低，训练速度变慢的问题。为了克服这些局限，研究人员Sergey Zagoruyko和Nikos Komodakis在" Wide Residual Networks"（WRNs）这一论文中提出了一个新的网络架构设计。传统深残差网络的架构依赖于深度增加以提高准确性，但每提升百分之一的精度往往需要近乎翻倍的网络层数。这导致了特征重用的减少，即随着深度的递增，网络中的信息传递效率下降。为解决这一问题，作者进行了一项深入的实验研究，着重于ResNet块的结构优化。他们的研究重点在于调整网络的宽度（即每层神经元的数量），而非单纯地增加深度，从而试图通过拓宽网络来改善性能。他们提出的宽残差网络（Wide Residual Networks, WRNs）是一种新颖的网络设计，它在保持一定程度深度的同时，显著增加了每一层的宽度。这种设计策略旨在提高特征的复用性，使得信息在更广泛的路径上传播，减少了信息丢失的可能性。实验结果显示，即使是最简单的16层宽残差网络，也能在CIFAR、SVHN、COCO等多个数据集上实现更高的准确性和训练效率，甚至超越了那些千层深度的残差网络，达到了当时的最新技术水平。 WRNs的优越性在于它能够在保持高性能的同时，显著减小训练时间，这对于大规模训练和实时应用来说具有重要的实际意义。此外，论文还展示了在ImageNet数据集上的显著改进，进一步验证了宽残差网络在深度学习领域的重要地位。这篇论文不仅提出了一个有效的深度学习模型设计策略，而且揭示了在网络宽度与深度之间寻找平衡对于提高网络性能和训练效率的关键作用。这对于未来的深度学习研究者和实践者来说，提供了一个新的视角和方法来优化深度网络架构，以便在实际应用中取得更好的性能和效率。

SERGEY ZAGORUYKO AND NIKOS KOMODAKIS: WIDE RESIDUAL NETWORKS 3

thus seem to indicate that the main power of deep residual networks is in residual blocks, and

that the effect of depth is supplementary. We note that one can train even better wide resid-

ual networks that have twice as many parameters (and more), which suggests that to further

improve performance by increasing depth of thin networks one needs to add thousands of

layers in this case.

Use of dropout in ResNet blocks. Dropout was ﬁrst introduced in [27] and then was

adopted by many successful architectures as [16, 26] etc. It was mostly applied on top layers

that had a large number of parameters to prevent feature coadaptation and overﬁtting. It was

then mainly substituted by batch normalization [15] which was introduced as a technique to

reduce internal covariate shift in neural network activations by normalizing them to have spe-

ciﬁc distribution. It also works as a regularizer and the authors experimentally showed that a

network with batch normalization achieves better accuracy than a network with dropout. In

our case, as widening of residual blocks results in an increase of the number of parameters,

we studied the effect of dropout to regularize training and prevent overﬁtting. Previously,

dropout in residual networks was studied in [13] with dropout being inserted in the identity

part of the block, and the authors showed negative effects of that. Instead, we argue here

that dropout should be inserted between convolutional layers. Experimental results on wide

residual networks show that this leads to consistent gains, yielding even new state-of-the-

art results (e.g., 16-layer-deep wide residual network with dropout achieves 1.64% error on

SVHN).

In summary, the contributions of this work are as follows:

• We present a detailed experimental study of residual network architectures that thor-

oughly examines several important aspects of ResNet block structure.

• We propose a novel widened architecture for ResNet blocks that allows for residual

networks with signiﬁcantly improved performance.

• We propose a new way of utilizing dropout within deep residual networks so as to

properly regularize them and prevent overﬁtting during training.

• Last, we show that our proposed ResNet architectures achieve state-of-the-art results

on several datasets dramatically improving accuracy and speed of residual networks.

2 Wide residual networks

Residual block with identity mapping can be represented by the following formula:

l+1

= x

+ F (x

, W

) (1)

where x

l+1

and x

are input and output of the l-th unit in the network, F is a residual func-

tion and W

are parameters of the block. Residual network consists of sequentially stacked

residual blocks.

In [13] residual networks consisted of two type of blocks:

• basic - with two consecutive 3 × 3 convolutions with batch normalization and ReLU

preceding convolution: conv3 × 3-conv3 × 3 Fig.1(a)

• bottleneck - with one 3 × 3 convolution surrounded by dimensionality reducing and

expanding 1 × 1 convolution layers: conv1 × 1-conv3 × 3-conv1 × 1 Fig.1(b)

剩余14页未读，继续阅读

hywcxq

粉丝: 0

宽广残差网络：提升深度与效率的新突破

Python-PyTorch实现拥有1bit权重的WideResidualNetworks

He 等。 - 2016 - Identity Mappings in Deep Residual Networks.pdf

Deep Pyramidal Residual Networks.docx

Identity Mappings in Deep Residual Networks.zip

Aggregated Residual Transformations for Deep Neural Networks.pdf

wide residual networks

Wide Residual Networks

wide residual networks结构

Improved Residual Networks for Image and Video Recognition.pdf

Residual-Networks.zip_-baijiahao_47W_python residual_python残差网络

最新资源