Squeeze-and-Excitation Networks:提升CNN通道信息表达能力

需积分: 0 147 浏览量更新于2024-08-05 收藏 1.23MB PDF 举报

"SENet原版论文1" SENet（Squeeze-and-Excitation Networks）是由Jie Hu、Li Shen和Gang Sun等人提出的，主要关注深度学习中的卷积神经网络（CNN）的通道信息处理。传统的卷积操作在局部感受野内融合了空间和通道信息来提取特征，但SENet提出了一种新的架构单元——Squeeze-and-Excitation (SE)块，其目的是通过显式建模通道间的相互依赖关系，自适应地重新校准通道级特征响应。 SE块的核心思想是分为两个主要步骤：挤压（Squeeze）和激活（Excitation）。首先，挤压阶段通过全局平均池化（Global Average Pooling）将整个特征图的信息压缩成一个单一的向量，这一步骤可以捕获到输入特征图的整体上下文信息。然后，在激活阶段，这个压缩后的信息被用来生成一个通道权重向量，这个向量会根据各个通道的重要性对原始特征图的每个通道进行重新调整。通过这种方式，SE块能够动态地调整网络对不同特征通道的关注程度，从而提高模型对关键特征的识别能力。论文表明，通过堆叠多个SE块，可以构建出能够跨多个具有挑战性的数据集具有良好泛化能力的SENet架构。特别地，SE块在保持计算成本轻微增加的前提下，显著提升了现有顶级深度网络架构的性能。在2017年的ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 分类任务中，基于SENet的模型取得了卓越的表现，奠定了SENet在深度学习领域的基础。此外，SENet的设计不仅限于图像分类任务，它的概念可以扩展到其他计算机视觉任务，如目标检测、语义分割等，甚至可以应用于自然语言处理等领域，通过调整不同通道的权重来优化特征表示。这种对通道信息的重视和有效利用，为深度学习模型的优化提供了一个新的视角，推动了网络结构设计的进步。

sigmoid) and sequential techniques [11, 37]. Recent work

has shown its applicability to tasks such as image captioning

[4, 44] and lip reading [7], in which it is exploited to efﬁ-

ciently aggregate multi-modal data. In these applications,

it is typically used on top of one or more layers represent-

ing higher-level abstractions for adaptation between modal-

ities. Highway networks [36] employ a gating mechanism

to regulate the shortcut connection, enabling the learning

of very deep architectures. Wang et al. [42] introduce

a powerful trunk-and-mask attention mechanism using an

hourglass module [27], inspired by its success in semantic

segmentation. This high capacity unit is inserted into deep

residual networks between intermediate stages. In contrast,

our proposed SE-block is a lightweight gating mechanism,

specialised to model channel-wise relationships in a com-

putationally efﬁcient manner and designed to enhance the

representational power of modules throughout the network.

3. Squeeze-and-Excitation Blocks

The Squeeze-and-Excitation block is a computational

unit which can be constructed for any given transforma-

tion F

: X → U, X ∈ R

×H

×C

, U ∈ R

W ×H×C

For simplicity of exposition, in the notation that follows

we take F

to be a standard convolutional operator. Let

V = [v

, v

, . . . , v

] denote the learned set of ﬁlter ker-

nels, where v

refers to the parameters of the c-th ﬁlter. We

can then write the outputs of F

as U = [u

, u

, . . . , u

]

where

= v

∗ X =

s=1

∗ x

. (1)

Here ∗ denotes convolution, v

= [v

, v

, . . . , v

] and

X = [x

, x

, . . . , x

] (to simplify the notation, bias terms

are omitted). Here v

is a 2D spatial kernel, and therefore

represents a single channel of v

which acts on the corre-

sponding channel of X. Since the output is produced by

a summation through all channels, the channel dependen-

cies are implicitly embedded in v

, but these dependencies

are entangled with the spatial correlation captured by the

ﬁlters. Our goal is to ensure that the network is able to in-

crease its sensitivity to informative features so that they can

be exploited by subsequent transformations, and to suppress

less useful ones. We propose to achieve this by explicitly

modelling channel interdependencies to recalibrate ﬁlter re-

sponses in two steps, squeeze and excitation, before they are

fed into next transformation. A diagram of an SE building

block is shown in Fig. 1.

3.1. Squeeze: Global Information Embedding

In order to tackle the issue of exploiting channel depen-

dencies, we ﬁrst consider the signal to each channel in the

output features. Each of the learned ﬁlters operate with a

local receptive ﬁeld and consequently each unit of the trans-

formation output U is unable to exploit contextual informa-

tion outside of this region. This is an issue that becomes

more severe in the lower layers of the network whose re-

ceptive ﬁeld sizes are small.

To mitigate this problem, we propose to squeeze global

spatial information into a channel descriptor. This is

achieved by using global average pooling to generate

channel-wise statistics. Formally, a statistic z ∈ R

is gen-

erated by shrinking U through spatial dimensions W × H,

where the c-th element of z is calculated by:

= F

) =

W × H

i=1

j=1

(i, j). (2)

Discussion. The transformation output U can be in-

terpreted as a collection of the local descriptors whose

statistics are expressive for the whole image. Exploiting

such information is prevalent in feature engineering work

[31, 34, 45]. We opt for the simplest, global average pool-

ing, while more sophisticated aggregation strategies could

be employed here as well.

3.2. Excitation: Adaptive Recalibration

To make use of the information aggregated in the squeeze

operation, we follow it with a second operation which aims

to fully capture channel-wise dependencies. To fulﬁl this

objective, the function must meet two criteria: ﬁrst, it must

be ﬂexible (in particular, it must be capable of learning

a nonlinear interaction between channels) and second, it

must learn a non-mutually-exclusive relationship as multi-

ple channels are allowed to be emphasised opposed to one-

hot activation. To meet these criteria, we opt to employ a

simple gating mechanism with a sigmoid activation:

s = F

(z, W) = σ(g(z, W)) = σ(W

δ(W

z)), (3)

where δ refers to the ReLU [26] function, W

∈ R

×C

and

∈ R

C×

. To limit model complexity and aid general-

isation, we parameterise the gating mechanism by forming

a bottleneck with two fully-connected (FC) layers around

the non-linearity, i.e. a dimensionality-reduction layer with

parameters W

with reduction ratio r (we set it to be 16,

and this parameter choice is discussed in Sec. 6.3), a ReLU

and then a dimensionality-increasing layer with parameters

. The ﬁnal output of the block is obtained by rescaling

the transformation output U with the activations:

= F

scale

, s

) = s

· u

, (4)

where

X = [

, . . . ,

] and F

scale

, s

) refers to

channel-wise multiplication between the feature map u

∈

W ×H

and the scalar s

剩余10页未读，继续阅读

张盛锋

粉丝: 30
资源: 297

Squeeze-and-Excitation Networks:提升CNN通道信息表达能力

senet154预训练模型-senet154-c7b49a05.pth

SENet-master

SENet中文翻译1

SENet中英文对照翻译1

2.从LeNet到SENet1

SENet-PyTorch

Senet-开源

Lanado.Wi1seNet.gaAzqQC

GenicLana.Wi1seNet.gaiMfSV

Simple Senet:简单Senet是经典棋盘游戏Senet的一个版本。-开源

最新资源