RoI-BoW模型：图像内容表示的新方法

197 浏览量更新于2024-08-31 收藏 785KB PDF 举报

"这篇研究论文探讨了一种名为RoI-BoW（Region of Interest - Bag of Words）的图像内容表示方法，旨在改进传统的BoW模型，通过关注图像中的关键区域来提升图像检索的效率和准确性。作者团队来自中国东华大学计算机科学与工程系和南京大学新型软件技术国家重点实验室。文章于2014年提交，并在同年的10月被接受和在线发布。关键词包括RoI-BoW、图像内容表示、不同尺寸分割、图像检索、词袋模型、兴趣区域、特征提取和Gabor滤波。" 正文：在计算机视觉领域，图像内容的表示是图像注释和检索的核心部分，一直以来都是研究的热点。随着技术的发展，词袋模型(BoW)因其高效和准确的特性，逐渐受到广泛关注。BoW模型的基本思想是将图像视为一系列不考虑顺序的局部特征向量的集合，以此构建图像的全局表示。然而，传统的BoW模型在处理图像时，将所有区域视为等价的，忽视了在图像检索中某些区域（如显著目标或兴趣区域）的重要性。在图像检索应用中，这些特定区域往往对识别图像的主体和上下文起到关键作用。因此，论文提出了一种创新的RoI-BoW模型，该模型基于图像的兴趣区域（Region of Interest, RoI），对图像内容进行更精细化的表示。 RoI-BoW模型首先通过分割技术识别和分离出图像中的关键区域，这些区域可能包含图像的主要对象或者具有高度语义信息的部分。然后，利用特征提取算法，如Gabor滤波器，对这些区域进行特征抽取。Gabor滤波器在处理纹理、边缘和方向信息上表现优异，能够捕获图像的局部结构和细节。提取的特征被编码到一个词汇表中，形成一个“词袋”，每个“词”代表一种特征模式。为了进一步提高表示的效率和检索性能，论文可能还涉及了不同尺寸的分割策略，以适应不同尺度的对象和场景。这种策略有助于捕捉不同大小的物体，增强了模型对尺度变化的鲁棒性。最后，RoI-BoW模型通过统计每个“词”的频率来构建图像的表示，这些频率统计信息反映了图像的关键内容和结构。在图像检索任务中，通过比较查询图像和数据库中图像的RoI-BoW表示，可以有效地找到相似的图像，从而提高检索的精确度。这篇研究论文提出了RoI-BoW模型，这是一种针对图像内容表示的改进方法，强调了图像中的兴趣区域，并结合Gabor滤波器和不同尺寸的分割策略，以提升图像检索的效率和准确性。这种方法不仅为图像检索提供了新的思路，也为其他相关领域的研究，如目标检测和识别，提供了有价值的参考。

by DoG, and ﬁlter them by key point ﬁltering algorithm. Next, the

RoI is generated by these key points and represented by the BoW

model. At the same time, Non-RoI are also represented by the

BoW model. Finally, The visual words of RoI and Non-RoI are con-

nected to one visual word, which is used to represent the visual

features of the whole image.

Next we will introduce the RoI-BoW model in details.

Let the i-th image in the dataset be I

N

; i ¼ 1; 2; ...; N,

the original image dataset is denoted as D as follows

D ¼fI

; I

; ...; I

where N is the amount of images, N

and N

represent the size of

images. An input image can be seen as two variable function on

the rectangle

ðx; yÞ; ðx; yÞ2½1; 2; ...; N

½1; 2; ...; N

; i ¼ 1; 2; ...; N:

2.1. Key points ﬁltering

For each image, initial key points are detected ﬁrstly by differ-

ence-of-Gaussian (DoG) algorithm [44]. The difference-of-Gaussian

[44] with the scale

and constant multiplicative factor k can be

computed by

Dðx; y;

; kÞ¼Lðx; y; k

ÞLðx; y;

Þ; ð1Þ

in which, Lðx; y;

Þ is the scale space of an input image Iðx; yÞ. It can

be obtained by (seen in page 94 in [44])

Lðx; y;

Þ¼



þy

 Iðx; yÞ: ð2Þ

In practice, the size is usually chosen as

¼ 1:6 and the constant

multiplicative factor is chosen as k ¼

ﬃﬃﬃ

One sampling pixel(except for border pixels) is selected as key

point only if the value of Dðx; y;

; kÞ is larger than all of these

neighbors (in 3  3 region, the sampling pixel is the central one

and its eight neighbors, example, if the geometry coordinate is

ði; jÞ of sampling pixel, the 3  3 region includes night points, they

are ði  1; j  1Þ; ði  1; jÞ; ði  1; j þ 1Þ; ði; j  1Þ; ði; jÞ; ði; j þ 1Þ;

ði þ 1; j  1Þ; ði þ 1; jÞ; ði þ 1; j þ 1Þ) or smaller than all of them.

After the DoG algorithm, the set of the initial key points of

image I

is obtained and denoted as

¼fP

; P

; ...; P

where S

is the number of key points of image I

Since initial key points are too many, a ﬁlter algorithm is used to

remove some sparse points and retain the points distributed den-

sely. An example is illustrated in Fig. 3 and the ﬁltering algorithm

is introduced as follows.

The ﬁltering operator is denoted as h,

h : P

¼ P

; P

; ...; P

! Q

¼ Q

; Q

; ...; Q

; ð3Þ

where T

is the number of key points of image I

after ﬁltering.

Each initial key point is judged by a boolean function as formula

(4),



1; lðP

Þ P L;

0; lðP

Þ < L;

(

ð4Þ

where b ¼ 1 means retaining the point, and b ¼ 0 means removing. l

is a statistic function to calculate the number of key points around

Fig. 1. The above two pictures show the similarity between salient regions detection and region of interest with difference-of-Gaussian. the below two pictures tell us that

there is signiﬁcant difference. The ﬁrst column is the original images. The middle column is the corresponding salient regions. The last column is region of interest with

difference-of-Gaussian (the keypoints are labeled with red points). (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version

of this article.)

J. Zhang et al. / J. Vis. Commun. Image R. 26 (2015) 37–49

剩余12页未读，继续阅读

weixin_38594687

粉丝: 2
资源: 967

RoI-BoW模型：图像内容表示的新方法

CSS 像素图制作攻略

rol-automizer:http的动作音序器

rozetlere-rol-verme

rol-ui-vue3:基于Vue3.0的UI组件库

rol-x.github.io

Rolê-crx插件

角色「Rolê」-crx插件

ROL Game Positioning System-开源

El Rol de los Medios-crx插件

PLC例程-mov rol写流水灯.rar

最新资源