深度残差网络中的身份映射：提升精度与训练效率

需积分: 7 13 浏览量更新于2024-07-18 收藏 1.11MB PDF 举报

"残差网络（Residual Learning）是一种深度神经网络架构，由Kaiming He、Xiangyu Zhang、Shaoqing Ren和Jian Sun在微软研究院提出，旨在解决深度学习模型训练时遇到的梯度消失和过拟合问题。该工作主要集中在分析深度残差网络的核心组件——残差块中的身份映射（Identity Mappings）。在传统的深度网络设计中，每一层通常试图直接学习输入到输出的复杂变换。然而，ResNet通过引入残差连接（skip connections），允许信号在不改变原始特征的情况下进行传播，即yl = h(xl) + F(xl, Wl)，其中yl是通过残差函数F处理后的输出，而xl+1则等于原输入xl加上F的结果。这种结构形式上让每个残差块的输出可以直接加回其输入，简化了学习过程，使得深层网络的训练变得更有效率。身份映射作为残差连接的关键，使得网络能够保持前后层之间的直接路径，即使在网络深度增加时，也能直接传递信息，避免了梯度消失的问题。实验证明，使用这些身份映射的残差单元在CIFAR-10数据集上的错误率达到了4.62%，在CIFAR-100和ImageNet等大型图像识别任务上也取得了显著的提升。通过实验验证，作者们发现这些简单的连接对于网络性能的提升至关重要。为了进一步优化训练和泛化能力，研究人员提出了新的残差单元设计，这不仅简化了训练流程，还提高了模型的性能。这项工作的重要性在于它推动了深度学习领域的技术进步，特别是在处理大规模视觉数据时，ResNet证明了其强大的表现在深度学习标准基准上。代码可在Kaiming He的GitHub仓库中获取：<https://github.com/KaimingHe/resnet-1k-layers>。 Residual Learning通过引入身份映射和残差结构，革新了深度学习网络的设计，为构建更深、更高效的学习模型开辟了新的道路，并且对于理解深层网络的工作原理具有重要的理论价值。"

Units”. The original Residual Unit in [1] performs the following computation:

= h(x

) + F(x

, W

), (1)

l+1

= f(y

). (2)

Here x

is the input feature to the l-th Residual Unit. W

= {W

l,k

1≤k≤K

} is a

set of weights (and biases) associated with the l-th Residual Unit, and K is the

number of layers in a Residual Unit (K is 2 or 3 in [1]). F denotes the residual

function, e.g., a stack of two 3×3 convolutional layers in [1]. The function f is

the operation after element-wise addition, and in [1] f is ReLU. The function h

is set as an identity mapping: h(x

) = x

If f is also an identity mapping: x

l+1

≡ y

, we can put Eqn.(2) into Eqn.(1)

and obtain:

l+1

= x

+ F(x

, W

). (3)

Recursively (x

l+2

= x

l+1

+ F (x

l+1

, W

l+1

) = x

+ F (x

, W

) + F(x

l+1

, W

l+1

), etc.) we

will have:

= x

L−1

i=l

F(x

, W

), (4)

for any deeper unit L and any shallower unit l. Eqn.(4) exhibits some nice

properties. (i) The feature x

of any deeper unit L can be represented as the

feature x

of any shallower unit l plus a residual function in a form of

L−1

i=l

indicating that the model is in a residual fashion between any units L and l. (ii)

The feature x

= x

L−1

i=0

F(x

, W

), of any deep unit L, is the summation

of the outputs of all preceding residual functions (plus x

). This is in contrast to

a “plain network” where a feature x

is a series of matrix-vector products, say,

L−1

i=0

(ignoring BN and ReLU).

Eqn.(4) also leads to nice backward propagation properties. Denoting the

loss function as E, from the chain rule of backpropagation [9] we have:

∂E

∂x

∂E

∂x

∂E

∂x

1 +

∂

∂x

L−1

i=l

F(x

, W

)

. (5)

Eqn.(5) indicates that the gradient

∂E

∂x

can be decomposed into two additive

terms: a term of

∂E

∂x

that propagates information directly without concern-

ing any weight layers, and another term of

∂E

∂x



∂

∂x

L−1

i=l



that propagates

through the weight layers. The additive term of

∂E

∂x

ensures that information is

directly propagated back to any shallower unit l. Eqn.(5) also suggests that it

It is noteworthy that there are Residual Units for increasing dimensions and reducing

feature map sizes [1] in which h is not identity. In this case the following derivations

do not hold strictly. But as there are only a very few such units (two on CIFAR and

three on ImageNet, depending on image sizes [1]), we expect that they do not have

the exponential impact as we present in Sec. 3. One may also think of our derivations

as applied to all Residual Units within the same feature map size.

剩余14页未读，继续阅读

lucialee634

粉丝: 0
资源: 2

深度残差网络中的身份映射：提升精度与训练效率

"深度残差学习框架：解决深层神经网络训练难题

李宏毅机器学习笔记：Adaptive Learning Rate算法详解与应用

深度学习去噪技术：FDnCNN颜色模型研究

Residual_Neural_Network-master.zip_jubernotebook_mysteriousglc_r

Residual-Networks.zip_-baijiahao_47W_python residual_python残差网络

ls信道估计matlab代码-Residual_CNN:使用MATLAB重复DeepResidualLearningMeetsOFDMChan

第五期_Residual Learning.pptx

Transfer_Learning_ResNet50：在此存储库中，我们将执行转移学习，以在Keras中可用的ResNet50模型上训练CIFAR-10数据集

高斯白噪声matlab代码-Beyond-a-Gaussian-Denoiser-Residual-Learning-of-Deep-CNN-

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising:Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising-matlab开发

最新资源