视觉显著性短语检索：提升图像检索效率的新方法

115 浏览量更新于2024-08-26 收藏 2.01MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

本篇文章主要探讨了"基于视觉显着性的短语检索包"（Visual Saliency Based Bag-of-Phrases for Image Retrieval），这是一项在图像检索领域的重要技术，特别是在Microsoft Researches等机构的研究背景下。视觉显着性（Visual Saliency）是指在视觉感知中，人眼对特定区域的注意力集中程度，这些区域通常包含关键信息或具有较高的视觉吸引力。"Bag-of-Phrases"是一种特征表示方法，它将图像分解成若干个有意义的短语，而非单独的像素，这样有助于捕捉图像的局部和全局结构。文章的核心思想是利用视觉显著性检测（Saliency Detection）来指导短语的选取，以便在海量图像数据库中更有效地定位和检索具有特定主题或概念的图片。显著区域（Salient Regions）的识别是关键步骤，它通过算法分析图像中吸引人们注意力的部分，这些区域往往与用户的查询相关。作者们来自北京理工大学计算机科学学院，他们提出了一种结合视觉显著性和短语检索的创新方法，旨在提高图像检索的精确性和效率。这种方法可能涉及到预训练的显著性模型、特征选择策略以及可能的机器学习算法，以便在大规模图像库中找到最相关的文档。为了尊重版权，该研究强调了个人或课堂教学中复制此作品的权利，但不允许用于商业目的，且所有副本必须带有版权声明和完整的引用信息。对于任何形式的非授权复制、发布到服务器或分发到列表的行为，都需要事先获得ACM的特别许可并支付费用。该论文发表于2014年的VRCAI会议上，并获得了版权许可，引用号为978-1-4503-3254-5/14/11，定价为$15.00，可以通过doi:10.1145/2670473.2670510查阅。本文的贡献在于提出了一种新颖的图像检索框架，通过结合视觉显著性与短语描述，优化了图像搜索的用户体验，这在现代信息技术和视觉计算领域具有显著的实际应用价值。

资源详情

资源推荐

Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for commercial advantage and that copies bear this notice and the full citation on the

first page. Copyrights for components of this work owned by others than ACM must be

honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on

servers, or to redistribute to lists, requires prior specific permission and/or a fee.

Request permissions from permissions@acm.org.

VRCAI 2014, November 30 – December 02, 2014, Shenzhen, China.

http://dx.doi.org/10.1145/2670473.2670510

Visual Saliency Based Bag of Phrases for Image Retrival

Lijuan Duan

∗

College of Computer Science

Beijing University of Technology

Wei Ma

†

College of Computer Science

Beijing University of Technology

Jun Miao

‡

Key Laboratory of Intelligent

Information Processing of CAS,

Institute of Computing

Technology, CAS,

Beijing 100190, China

Xuan Zhang

College of Computer Science

Beijing University of Technology

Abstract

This paper presents a saliency based bag-of-phrases (Saliency-BoP

for short) method for image retrieval. It combines saliency detec-

tion with visual phrase construction to extract bag-of-phrase fea-

tures. To achieve this, the method ﬁrst detects salient regions in im-

ages. Then, it constructs visual phrases using the word pairs which

are from the same salient regions. Finally, it extracts the bag of vi-

sual phrases from the ﬁrst K salient regions to describe images. Ex-

perimental results on Corel 1K and Microsoft Research Cambridge

image database demonstrated that the Saliency-BoP method outper-

forms related methods such as Bag-of-Words (BoW) or Saliency-

BoW.

CR Categories: I.4.7 [Image Processing and Computer Vision]:

Feature Measurement—Feature representation;

Keywords: Image retrieval, saliency, bag-of-phrases

1 Introduction

Image retrieval has obtained a great improvement in recent years.

Many features from low levels to high semantic levels were applied

in image retrieval. There are a great improvement on precision and

speed in image retrieval, but it is not critical breakthrough. The key

is that there is no way to recognize the real meaning of image. In

other words, image retrieval just like human searching for objects

in the picture would come true unless the true meaning of the im-

age can be recognized. The paper tries to propose a new image

descriptor based on human visual saliency to represent the image

effectively.

Bag-of-words (BoW) model [Csurka et al. 2004] is very popular

in image retrieval and image classiﬁcation currently. It represents

images by using features sets. The main idea of BOW is com-

posed of three major steps: 1) extract image features; 2) construct

the codebook; 3) get an image descriptor through mapping features

into codebook. Then we rank these images by calculating similar-

ities between the query and the images from the database. Philbin

∗

e-mail: ljduan@bjut.edu.cn

†

e-mail: mawei@bjut.edu.cn

‡

e-mail: jmiao@ict.ac.cn

e-mail: zhangxuan2011@emails.bjut.edu.cn

[Philbin et al. 2007] applied BoW model in large scale image re-

trieval for the ﬁrst time and the method had a good performance.

BoW model has no context information and just calculates frequen-

cies of words. Zhang [Zhang et al. 2011b] proposed a new method

of constructing visual phrases [Jiang et al. 2012; Zhang et al. 2011a]

which involve the relative positions of visual words for image re-

trieval. The method outperforms BoW since the context was in-

volved. Shaban [Shabany et al. 2013] proposed a global similarity

method that combined manifold with BoW model. These methods

have good performances in image retrieval and image classiﬁcation.

We consider the problem from another view. We construct visual

phrases that contain contextual information in salient regions. This

correctness of the idea will be proved by experiments later.

In general, when features are extracted in an entire image, noises

are usually involved. This may cause negative effects in image

retrieval for background information which inﬂuences the repre-

sentation of images obviously. Visual attention model was intro-

duced by researchers to alleviate this problem. It locates salient

regions of scenes through simulating automatic selective attention

mechanism of humans. The paper demonstrates constructing vi-

sual phrases with contextual information in salient regions can de-

crease noises. Itti [ITTI et al. 1998] ﬁrst proposed visual attention

model. This theory thinks that human ﬁxations stay in different

regions with different time interval and human ﬁxations are in dis-

order when human observe a picture. These phenomena are consid-

ered to be the difference of visual attention on the image. Saliency

map, which simulates human true ﬁxations, can get from visual at-

tention model. Many researchers proposed methods to calculate

saliency maps from different aspects with good performance. For

example, Duan et al. [Duan et al. 2011] proposed visual saliency

detection by spatially weighted dissimilarity. Its saliency map is

calculated by using dissimilarity between blocks and positions. So

far, saliency model had already been used in many ﬁelds, such as

sparse coding [Kanan and Cottrell 2010] and image segmentation

[Achanta et al. 2008].

The main innovation of this paper is to introduce visual attention

to image retrieval and on this basis, to construct visual phrases

in salient regions according to certain rules. We call this method

as bag of phrases based on visual saliency (Saliency-BoP). Ex-

periments on Corel 1K and Microsoft Research Cambridge image

database [Ulusoy and Bishop 2005] show that our method outper-

forms the BoW model.

The rest of the paper is organized as follows. In Section 2, we in-

troduce our method in details. Experimental results and discussion

are presented in Section 3. Finally, a conclusion is drawn in Section

243

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38678550

粉丝: 3
资源: 955

视觉显著性短语检索：提升图像检索效率的新方法

论文研究-基于自然语言的检索方法在图案数据库中的应用研究.pdf

融合位置特征的关键短语集合抽取模型.pdf

element中automplate检索关键字并标红

基于python实现的英文文本信息检索系统

java检索技术和实现

在屏幕上显示一句短语“prigramming+in+c+is+fun”

pdfium.dll 全文检索

请问什么是信息检索系统

简述基于短语结构树的语义角色标注方法与基于依存关系树的语义角色标注方法之间的核心差异

素短语 直接短语 句柄

pubmed数据库高级检索

在识别最左素短语时，用优先性“高于”识别最左素短语的尾，用优先性“低于”识别素短语的尾。

调用textrank4zh包时，( )用于返回文本的关键短语

lucence语法查询一句话

如何理解短语，直接短语，句柄

hanlp的短语的相似度计算

根据一段文字检索其目的的java项目

es全文检索word文件

素短语、最左素短语的概念和求法

内容检索、召回、排序算法

最新资源

素短语直接短语句柄