细粒度分类：分层双线性池与聚合松弛掩码的应用

28 浏览量更新于2024-08-28 收藏 1.86MB PDF 举报

"这篇研究论文探讨了一种用于细粒度分类的新方法——通过带有聚合松弛掩码的分层双线性池。该方法是针对深度学习领域中的图像识别任务，尤其是那些需要对物体进行精确区分的场景，如鸟类、汽车模型等细粒度类别识别。" 细粒度分类是计算机视觉领域的一个重要挑战，它要求模型能够识别出类别之间的微小差异，而不仅仅是大类别的区分。传统的卷积神经网络（CNN）在处理这类问题时可能会遇到困难，因为它们主要关注全局特征，而忽视了局部细节。文章提出的分层双线性池化是一种改进的特征提取策略，它结合了双线性池化和层次结构的概念。双线性池化可以捕获特征间的高阶交互，有助于提取更丰富的上下文信息，而层次结构则允许模型在不同抽象级别上处理信息，从而更好地理解复杂模式。聚合松弛掩码则是对双线性池化的一种优化，它动态地调整掩码以适应训练过程中的变化，提高了模型的泛化能力和鲁棒性。论文中可能涵盖了以下几个方面的内容： 1. **双线性池化理论**：解释了双线性池化的数学原理，以及它是如何增强特征表示的多样性，提高模型对细微差异的敏感性的。 2. **层次结构的设计**：描述了如何构建层次结构的池化层，以及这种结构如何帮助模型逐步理解图像的多层次信息。 3. **聚合松弛掩码**：详细介绍了聚合松弛掩码的实现方式，包括掩码的生成、更新机制，以及如何在训练过程中减少过拟合并提高模型性能。 4. **实验与结果**：论文可能包含了在多个细粒度分类数据集上的实验，对比了新方法与传统CNN和其他深度学习方法的性能，展示了新方法的优势。 5. **应用前景**：讨论了该技术在实际应用中的潜力，例如在医疗图像分析、自动驾驶等领域，以及未来可能的研究方向。 6. **支持基金**：提及了本研究工作得到了浙江省自然科学基金的资助，编号为LY19F020038。通过这种方法，研究者们旨在提升深度学习模型在细粒度分类任务中的表现，推动计算机视觉领域的进步。这不仅有助于提升模型的准确性和实用性，也为解决其他依赖于精细特征识别的问题提供了新的思路。

SPECIAL SECTION ON INNOVATION AND APPLICATION OF INTELLIGENT PROCESSING OF

DATA, INFORMATION AND KNOWLEDGE AS RESOURCES IN EDGE COMPUTING

Received August 5, 2019, accepted August 13, 2019, date of publication August 19, 2019, date of current version September 5, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2936118

Fine-Grained Classification via Hierarchical

Bilinear Pooling With Aggregated Slack Mask

MIN TAN

, (Member, IEEE), GUIJUN WANG

, JIAN ZHOU

ZHIYOU PENG

, AND MEILIAN ZHENG

3,4

Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University,

Hangzhou 310018, China

Department of Pain Medicine, The First Afﬁliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China

School of Management, Zhejiang University of Technology, Hangzhou 310023, China

Zhejiang Hithink RoyalFlush Artiﬁcial Intelligence Research Institute, Hangzhou 310023, China

Corresponding author: Meilian Zheng (zmldlk@zjut.edu.cn)

This work was supported in part by the Zhejiang Provincial Natural Science Foundation of China under Grant LY19F020038, in part by the

National Natural Science Foundation of China under Grant 61602136, Grant 81603198, and Grant 61622205, and in part by the Zhejiang

Provincial Key Science and Technology Project Foundation under Grant 2018C01012.

ABSTRACT Extracting discriminative ﬁne-grained features is essential for ﬁne-grained image recognition

tasks. Many researchers utilize expensive human annotations to learn discriminative part models, which may

be impossible for real-world applications. Recently, bilinear pooling has been frequently adopted and has

shown its effectiveness owing to its learning discriminative regions automatically. However, most bilinear

pooling models still utilize the all convolutional part/region features for recognition, including those noisy

or even harmful feature elements. In this paper, we devise a novel ﬁne-grained image classiﬁcation approach

by the Hierarchical Bilinear Pooling with Aggregated Slack Mask (HBPASM) model. The proposed model

generates a RoI-aware image feature representation for better performance. We conduct experiments on

three frequently used ﬁne-grained image classiﬁcation datasets. The experimental results demonstrate that

HBPASM achieves competitive performance or even match the state-of-the-art methods on CUB-200-2011,

Stanford Cars, and FGVC-Aircraft, respectively.

INDEX TERMS Fine-grained classiﬁcation, image mask, multi-scale, RoI feature, deep learning.

I. INTRODUCTION

Owing to the development of deep learning, many efforts

have been made in many computer vision tasks. Though with

much progress, there are still many challenges in ﬁne-grained

classiﬁcation tasks. Unlike traditional image classiﬁcation

tasks, ﬁne-grained classiﬁcation aims at identifying subcate-

gories with subtle visual differences. These visual differences

can be easily confused by complex image background in

images. Therefore, it is necessary to reduce the impact of

background information and extract discriminative RoI fea-

tures for ﬁne-grained classiﬁcation tasks.

Fine-grained image classiﬁcation aims at distinguish-

ing different subordinate classes with subtle visual

differences [1]–[3]. It serves as a core problem in many

multimedia applications [4], e.g., image understanding,

The associate editor coordinating the review of this article and approving

it for publication was Ying Li.

cross-modal retrieval, etc. Though many efforts have been

made to improve the performance [5], high visual similarities

among different categories still challenge this task [6], espe-

cially when images have cluttered background. To deal with

the subtle visual differences, researchers focus on localizing

distinctive regions or extracting discriminative features for

improved performance.

Many efforts have been made to design part-based models

to localize object parts as the distinctive regions [7]–[12].

These models are obtained by analyzing the convolu-

tional activations from neural network in an unsupervised

manner or discriminatively training part detectors with

supervised bounding-box/part annotations. Among these

models, bilinear convolutional neural network (CNN) model,

i.e., BCNN, [13] and its variants [14], [15] have achieved sat-

isfactory results on many ﬁne-grained image datasets. They

helps learn distinctive regions without utilizing expensive

part annotations, and the distinctive regions are discovered by

117944

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/

VOLUME 7, 2019

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38610682

粉丝: 6
资源: 878

细粒度分类：分层双线性池与聚合松弛掩码的应用

IP聚合及查询工具

shell实现netmask掩码和cidr掩码位转换1

jQueryFormMaskValidation:掩码表单和JQuery验证。 在同一输入上带有掩码和验证的CPFCNPJ。 带DDD掩码和8（八）或9（九）位数字的电话。 带翻译的遮罩的代码字段

基于改进的mask r-cnn的行人细粒度检测算法

变长子网掩码和路由聚合在网络设计中的应用

通过 STFT 计算和应用时频掩码：计算理想的时频掩码并通过 STFT 将时频掩码应用于信号-matlab开发

子网掩码子网掩码计算器

理解IP地址：结构、子网掩码与网络分层

子网掩码在软件工程中的应用与分类

理解子网掩码的二进制表示与IP地址分类

最新资源

jQueryFormMaskValidation:掩码表单和JQuery验证。在同一输入上带有掩码和验证的CPFCNPJ。带DDD掩码和8（八）或9（九）位数字的电话。带翻译的遮罩的代码字段