小样本细粒度图像分类的混合注意力提升策略

110 浏览量更新于2024-08-26 2 收藏 479KB PDF 举报

本文主要探讨了小样本细粒度图像分类任务中的混合注意机制。细粒度图像分类在计算机视觉领域是一项关键挑战，其难点在于同类图像内部相似性高，且每个类别的训练数据点数量有限，这使得深度神经网络的训练面临困难。为了提高分类性能，作者提出了一个直观的策略：通过学习更精细的特征来区分相似的类别。首先，作者指出通道注意力机制（Channel Attention）在学习更有区分力的特征方面具有潜力，因为它能够根据图像的不同通道分配不同的权重，从而突出那些对分类决策更重要的特征。通道注意力通常通过卷积神经网络内的squeeze-and-excitation模块实现，该模块通过对全局特征进行全局平均池化和归一化，再通过一系列线性变换来提取通道级的重要性。接着，文章提出了一种新的空间注意力机制（Spatial Attention），灵感来源于在像素级上捕捉图像局部特征的重要性。传统的空间注意力往往关注于图像的全局结构，但在这个细粒度任务中，作者将Squeeze-and-Excitation块进行修改，使其能够在局部区域学习到更加细致和具体的特征。空间注意力允许模型根据不同部分的上下文信息调整特征的权重，增强了对细节特征的捕捉。最后，为了综合利用通道和空间两种注意力机制的优势，本文提出了混合注意机制（Mixed Attention）。这个机制结合了通道注意力和空间注意力，通过同时考虑全局和局部特征的重要性，优化了特征表示，从而提高了小样本细粒度图像分类的性能。混合注意机制可能包括交替使用这两种注意力，或者通过某种方式融合它们的输出，以达到最优的特征提取效果。这篇研究论文的核心贡献在于设计并实现了一个针对小样本细粒度图像分类的混合注意机制，旨在通过更深入地理解图像特征的多维度表达，提升模型在数据匮乏情况下对细小差异的识别能力。通过实验验证，这种混合注意机制在实际应用中展示了显著的性能提升，为解决细粒度图像分类问题提供了一种创新且实用的方法。

Mixed Attention Mechanism for Small-Sample

Fine-grained Image Classiﬁcation

Xiaoxu Li

∗

, Jijie Wu

∗

, Dongliang Chang

†

, Weifeng Huang

‡

, Zhanyu Ma

†

and Jie Cao

∗

Lanzhou University of Technology, Lanzhou, China

E-mail: xiaoxulilut@gmail.com

†

Beijing University of Posts and Telecommunications, Beijing, China

E-mail: mazhanyu@bupt.edu.cn

‡

South-to-North Water Diversion Middle Route Information Technology Co., Ltd., China

E-mail: huangweifeng@nsbd.cn

Abstract—Fine-grained image Classiﬁcation is an important

task in computer vision. The main challenge of the task are

that intra-class similarity is large and that training data points

in each class are insufﬁcient for training a deep neural network.

Intuitively, if we can learn more discriminative features and more

detailed features from ﬁned-grained images, the classiﬁcation

performance can be improved. Considering that channel atten-

tion can learn more discriminative features, spatial attention can

learn more detailed features, this paper proposes a new spatial

attention mechanism by modifying Squeeze-and-Excitation block,

and a new mixed attention by combining the channel attention

and the proposed spatial attention. Experimental results on two

small-sample ﬁne-grained image classiﬁcation datasets demon-

strate that on both VGG16 network and ResNet-50 network, the

proposed two attention mechanisms achieve good performance,

and outperform other referred ﬁne-grained image classiﬁcation

methods.

I. INTRODUCTION

With rapid development of deep learning, Convolutional

Neural Networks (CNNs) are widely used in the task of ﬁne-

grained image classiﬁcation which is to distinguish one sub-

ordinate categories from others among the same superordinate

category [1]. Fine-grained image classiﬁcation based on CNNs

have obtained impressive performance either by replacing

hand-crafted features with CNN features or by adopting an

end-to-end fashion. However, there still exists big challenges

since intra-class similarity is large and training data points in

each class are insufﬁcient in ﬁne-grained images [2][3][4].

The works of ﬁne-grained image classiﬁcation based on

CNNs mainly focus on learning more subtle and more discrim-

inative features. Some works improved network structure [5],

[6], [4], some works proposed a new loss [3], and some works

improved ﬁne-grained classiﬁcation by introducing attention

mechanism [7], [8], [9]. The Attention Mechanisms in Neural

Networks are derived from the visual attention mechanism

found in humans. Human visual attention focuses on a certain

region of an image with “high resolution” while perceiving

the surrounding image in “low resolution”, and then adjusting

the focal point over time [10]. Fine-grained classiﬁcation with

attention mechanism could learn more delicate difference than

other methods[11].

There are three types of attention mechanisms, e.g. channel

attention, such as the SE (Squeeze-and-Excitation) block [12]

, the spatial attention, such as the Spatial Transformer [13],

and mixed attention, such as two-level Attention Models [8]

and Recurrent attention model [10]. The channel attention

aims to learn more discriminative features. SE (Squeeze-

and-Excitation) block [12] is a classical channel attention

method, which focuses on the channel relationship and adap-

tively recalibrate channel-wise feature responses by explicitly

modeling interdependencies between channels. The spatial

attention aims to learn more detailed features. The Spatial

Transformer [13], a new learnable and differentiable module,

which explicitly allows the spatial manipulation of data and

can be inserted into existing convolutional architectures. The

method is conditional on the feature map itself and can learn

invariance to scale, rotation and so on. Residual Attention

Network [14] is built by stacking Attention Modules which

generate attention-aware features.

The mixed attention aims to learn more discriminative and

more detailed features simultaneously. Two-level Attention

Model [8] combines three types of attention: the bottom-up

attention, the object-level top-down attention, and the part-

level top-down attention, which are responsible for proposing

candidate patches, selecting relevant patches to a certain

object, and localizing discriminative parts, respectively, to

ﬁnd object parts and extract discriminative features. Recurrent

attention model [10] is a recurrent neural network model that

is capable of extracting information from an image by adap-

tively selecting a sequence of regions or locations and only

processing the selected regions at high resolution. Compared

with convolutional neural networks, the model greatly reduces

the amount of computation.

Intuitively, if we can learn more discriminative features

and more detailed features from ﬁned-grained images, the

classiﬁcation performance can be improved. Therefore, this

paper builds on the existing mixed attention works, proposes

a new spatial attention mask by modifying Squeeze-and-

Excitation block, and a new mixed attention method by com-

bining the channel attention and the proposed spatial attention.

In order to evaluate the proposed two attention methods,

we use two widely used networks, VGG16 and ResNet-50,

and select two small-sample ﬁne-grained image classiﬁcation

datasets, the Stanford Cars-196 dataset and the FGVC-Aircraft

Proceedings of APSIPA Annual Summit and Conference 2019

18-21 November 2019, Lanzhou, China

APSIPA ASC 2019

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38692836

粉丝: 4

小样本细粒度图像分类的混合注意力提升策略

细粒度图像识别：CNNs数据增广与背景处理新方法

提升车辆细粒度分类：FV-SIFT与深度卷积特征融合

细粒度OOD检测新突破：混合离群值暴露提升覆盖率

如何设计一个深度神经网络，利用Squeeze-and-Excitation块和混合注意机制，在小样本细粒度图像分类任务中提升分类性能？

在小样本细粒度图像分类中，如何通过混合注意机制和Squeeze-and-Excitation块来提升分类性能？请结合具体应用背景详细解释。

细粒度图像分类数据集类别不平衡

修复局部描述子网络的小样本学习方法.docx

混合深度学习在自动鳞翅目昆虫图像分类中的应用

利用数字图像处理技术评估沥青混合料均匀性的方法

使用Halcon 10进行图像分类：GMM、SVM与神经网络算法解析

最新资源