创新软分配位置-方向池化提升图像分类性能

129 浏览量更新于2024-08-27 收藏 173KB PDF 举报

本文探讨了在图像分类领域中一个创新的池化方法——Soft-Assignment Location-Orientation Pooling（SALOP）。传统的 Bag-of-Visual Words (BoW) 模型是图像分类中的关键组成部分，它通过将图像分割成局部特征（如SIFT、SURF等），然后将其转换为视觉词袋来表征图像内容。BoW模型的核心是池化步骤，即如何有效地整合这些局部特征。 SALOP在此基础上提出了一个新的策略，它受到Bag-of-Statistical Sampling Analysis (Bossa) 的启发。与BoW模型相比，SALOP引入了更多元化的考虑，不仅依赖于字典中的视觉词，还利用了局部特征在空间位置（location）和方向（orientation）上的信息。这种位置-方向结合的池化方式有助于提高特征编码的精确性和鲁棒性，因为它们能够更好地捕捉到图像中的局部结构和纹理细节。 SALOP的主要创新在于采用了软分配（soft-assignment）的池化策略。传统的硬分配可能会在遇到模糊边界或噪声时导致错误的匹配，而软分配则允许局部特征与多个字典原子进行部分匹配，这有助于处理池化过程中的不确定性。通过这种方式，SALOP能够更好地适应图像中的各种复杂情况，减少了信息丢失，并提高了对图像特征的多样性敏感度。实验部分在Scene15和PASCAL VOC 2007两个广泛使用的图像基准数据集上进行了评估。结果显示，相比于现有的池化方法，SALOP展示了显著的性能提升，证明了其在图像分类任务中的有效性。因此，SALOP为改进图像特征的表示学习提供了一种有前景的新方法，尤其是在处理含有大量位置和方向信息的场景中，它的表现尤为突出。总结来说，本文的重要贡献在于提出了一种基于位置和方向信息的软分配池化技术，该技术通过改进BoW模型，提升了图像分类任务的性能，并且在处理模糊边界和不确定性时展现出优势。对于那些关注图像特征提取和分类精度的学者和工程师来说，SALOP提供了有价值的研究方向和技术参考。

展开

ASK THE DICTIONARY: SOFT-ASSIGNMENT LOCATION-ORIENTATION POOLING FOR

IMAGE CLASSIFICATION

Qilong Wang

, Xiaona Deng

, Peihua Li

, Lei Zhang

Dalian University of Technology,

The Hong Kong Polytechnic University

ABSTRACT

The pooling step is one of the key components of the well-

known Bag-of-visual words (BoW) model widely used in

image classiﬁcation. In this paper, we propose a novel

pooling method, which is called Soft-Assignment Location-

Orientation Pooling (SALOP). Inspired by the bag of sta-

tistical sampling analysis (Bossa), SALOP also explores the

effect of dictionary for pooling method, but leverages both lo-

cation and orientation information between the local descrip-

tors and the atoms of dictionary to aggregate feature codes.

Moreover, different from existing pooling methods, SALOP

employs a soft-assignment pooling scheme to handle ambi-

guity and uncertainty existing in the pooling process. The

evaluation is conducted on two image benchmarks: Scene15

and PASCAL VOC 2007. The experimental results show our

SALOP can achieve promising performances.

Index Terms— Image classiﬁcation, Bag-of-visual words,

soft-assignment, location-orientation pooling, dictionary

1. INTRODUCTION

Bag-of-visual words ( BoW) model [20] has been successfully

applied to various image and vision tasks, especially in image

classiﬁcation. As illustrated in Fig. 1, BoW model involves

local feature extraction, dictionary learning (e.g. k-means),

feature encoding, pooling, and classiﬁcation. Among them,

the pooling process aggregates feature codes into a ﬁnal im-

age representation, which has great inﬂuence on classiﬁcation

performance [9], and has attracted a lot of attentions in recent

years. The lines of research on pooling algorithms can be

roughly divided into four groups based on the aforementioned

components of BoW model.

Feature level based methods design pooling operations

from the point of view of the local feature space. Boureau

et al. [4] proposed a pooling method to aggregate feature

codes based on feature space determined by clustering algo-

rithm, which puts codes from similar local features together,

and performs pooling operation on codes in each cluster of

local features. Following [4], Fanello et al. [19] divided fea-

ture space into 𝐾 bins (𝐾 is the number of classes), where

The work was supported by the National Natural Science Foundation of

China (61471082, 61405022).

Localized soft-

assignment

coding

Image

Classification

(b) BOSSANOVA

Feature

encoding

Pooling step

Local feature

extraction

Dictionary learning

e.g. k-means

e.g. SVM

(a) SALOP

9090909090909090

...

Fig. 1. The classiﬁcation paradigm used in this paper. Our

main contribution is to propose a novel pooling method SA-

LOP which is shown in (a). For an input descriptor, we com-

pute its location and orientation to neighboring atoms of dic-

tionary, and perform pooling operations based on joint distri-

bution of location and orientation. (b) shows the methodology

of BossaNova[1] - only location information is employed.

a weighted max pooling was performed. The weight in each

bin is provided by score of SVM classiﬁer learned on training

features.

Code level based methods contain two basic and widely

used pooling operations: average or max pooling, and a lot of

works are proposed to improve them. [3, 7] employ a ℓ

𝑝

norm

to make a trade-off between average and max pooling, and pa-

rameter 𝑝 can be learned from the training data [7]. Alterna-

tive effective improved (@n) pooling [12] can also be seen as

a trade-off between average and max pooling, but more ﬂexi-

ble and robust. Refer to [12] for more details and comparison

of code-level pooling methods.

Image level pooling methods can be traced back to spatial

pyramid matching (SPM) [14], which introduces spatial infor-

mation into BoW model by dividing image into some regular

parts. [7, 23] learned weighted pooling t o exploit geometric

structure of image. Along this line, Cao et al. [5] proposed

下载后可阅读完整内容，剩余4页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

weixin_38685882

粉丝: 6

创新软分配位置-方向池化提升图像分类性能

matlab宋体代码-SB-assignment-3:SB-assignment-3

ghc---external-flow-assignment-EvGamer:ghc --- external-flow-assignment-EvGamer由GitHub Classroom创建

react-assignment-1-elizabethompson：react-assignment-1-elizabethompson由GitHub Classroom创建

api-assignment-4Danish:api-assignment-4Danish由GitHub Classroom创建

HTML编程实践：Class-Assignment-33总结

JavaScript 实践项目：JS-Assignment-1 探究

JavaScript项目实践：MOD-Assignment-2-Group-4分析

深入探索HTML技术：PH-Assignment-2解析

Python编程作业截止提醒：bot-assignment-due

HTML编程实战：144-assignment-3项目解析

最新资源