视觉与语义并重：图像检索的两步相似度排序

PDF格式 | 296KB | 更新于2024-08-26 | 140 浏览量 | 举报

"这篇研究论文提出了一种两步相似度排序方案，用于改善图像检索的准确性和效率。该方法结合了视觉特征和语义结构，旨在更好地反映图像数据库中的内在相似性。首先，通过自我调谐的流形排名（Manifold Ranking, MR）方法生成初步的基于视觉的相似度排名。在这一阶段，使用高斯核进行优化。然后，在第二步中，考虑了图像的语义信息，以进一步调整和精炼排名结果，提高检索效果。该方案的创新之处在于其对视觉和语义相似性的双重保留，从而提高了CBIR（基于内容的图像检索）系统的性能。" 本文的核心知识点包括： 1. **基于内容的图像检索（Content-Based Image Retrieval, CBIR）**：这是一种技术，用户通过输入图像或图像的特定特征来搜索与之相似的图像。CBIR系统的关键在于如何准确地衡量和排序图像之间的相似性。 2. **流形排名（Manifold Ranking, MR）**：MR是一种常用的方法，尤其在CBIR系统中的相关反馈中，它通过学习图像间的非线性关系来改进检索效果。然而，传统MR方法主要依赖视觉特征，可能无法准确反映图像的语义结构。 3. **两步相似度排序方案**：该论文提出的解决方案分为两个阶段。首先，使用自我调谐的MR方法生成初步的视觉相似度排名，这一步利用了高斯核函数，有助于捕捉图像间的复杂关系。 4. **高斯核**：在机器学习中，高斯核是一种常用的核函数，可以将低维空间的数据映射到高维空间，使得数据在高维空间中更容易被区分。在本文中，高斯核被用来优化视觉特征的相似度计算。 5. **语义相似性**：除了视觉特征外，该方案还考虑了图像的语义信息，这是大多数MR方法忽视的一点。语义相似性是指图像内容的含义和上下文的相似度，它能提升检索的精确性。 6. **自我调谐**：自我调谐是指算法能够根据数据自身的特点自动调整参数，以达到最佳性能。在本文的上下文中，这可能意味着MR方法会根据图像数据的特性动态地优化高斯核的参数。 7. **图像检索系统的性能提升**：通过结合视觉和语义信息，该两步相似度排序方案有望提高CBIR系统的检索精度和用户满意度，特别是在处理大规模、复杂图像数据库时。这篇论文为图像检索提供了一个新颖且有潜力的解决方案，通过结合视觉和语义信息的两步排序，增强了图像检索的准确性和实用性。

A Two-Step Similarity Ranking Scheme for Image Retrieval

Di Wu

1, 2

, Jun Wu

3, *

, Ming-Yu Lu

, Chun-Li Wang

School of Information Science and Technology, Dalian Maritime University, China

Software Technology Institute, Dalian Jiaotong University, China

School of Computer and Information Technology, Beijing Jiaotong University, China

wudi@qq.com, wuj@bjtu.edu.cn, lumingyu@dlmu.edu.cn, wangcl@dlmu.edu.cn

Abstract—similarity ranking is one of the keys of a content-

based image retrieval (CBIR) system. Among various methods,

manifold ranking (MR) is popular for its application to

relevance feedback in CBIR. Most existing MR methods only

take the visual features into account in the similarity ranking,

however, which is not accurate enough to reflect the intrinsic

semantic structure of a given image database. In this paper, we

propose a two-step similarity ranking scheme that aims to

preserve both visual and semantic resemblance in the

similarity ranking. Concretely, in the first step it derives an

initial visual-based similarity rank through a self-tuning MR

solution. In particular, the Gaussian kernel used in our scheme

is refined by using a point-wise bandwidth. In the second step,

the rank of each database image is further adjusted to achieve

semantic consistency by mining the query log. An empirical

study shows that using two-step similarity ranking in CBIR is

beneficial, and the proposed scheme is more effective than

some existing MR approaches.

Keywords - image retrieval; relevance feedback; similarity

ranking; manifold ranking

I. INTRODUCTION

Content-Based Image Retrieval (CBIR), as an effective

supplementary means of traditional Web search, has drawn

substantial research attention in recent years [1]. Rather than

using text, in a CBIR system, a search task may be initiated

using a query image that is posed by the user to convey a

certain query concept. Then system ranks the database

images according to their similarities to the query concept,

based on the visual features automatically extracted from

images. However, the low-level visual features are

insufficient to characterize the high-level semantics, i.e. the

so-called semantic gap. Relevant feedback has been shown

as a powerful tool to bridge this gap by exploiting the user’s

interaction with CBIR system [19]. During the interaction,

the user is encouraged to label a few images returned as

either positive or negative in terms of whether they are

relevant to the query concept or not, and the labeled images

are then given to the system as additional query examples so

that the ranking results can be refined. In essence, searching

the query concept through relevance feedback can be

regarded as a similarity learning problem, i.e. the system ties

to learn an appropriate similarity measure through relevance

feedback, in order to understand the user’s information needs.

Some recent studies along this direction [3, 18, 22, 2, 5, 10

Corresponding author

and 11] focus on learning similarity measures based on the

online feedback information, which is called short-term

learning, while others [15, 16, 4, 8, 12 and 13] aims to

achieve it by using both online and historical feedback

sessions, named the long-term learning.

In the meantime, a surge of efforts have been made in

theory for the graph-based learning, especially in manifold

ranking (MR) [20, 21]. By taking the intrinsic geometrical

structure into account, MR assigns each data point a relative

ranking score, instead of an absolute pairwise similarity as

traditional ways. The score is treated as a distance metric

defined on the data manifold, which is more meaningful to

capture the similarities among data points. Previous studies

have shown that MR is one of the most promising and

successful approaches for image retrieval with relevance

feedback [3, 9, 11, 13 and 14]. Specifically, in [13], both

short-term and long-term feedback experiences are integrated

into a unified MR framework.

Despite the success, the performance of existing MR

methods is sensitive to the “strange points

”. MR is based on

an assumption that the data points lying in a same local

neighborhood share some common property and thus they

may have the same class label. As a result, a MR-like

algorithm tries to propagate the ranking score from each

labeled data point to its (mostly unlabeled) neighbor points.

However, due to the semantic gap, this assumption cannot

hold in the CBIR context. As illustrated by Figure 1, given

the query shown in 1a, and two groups of images shown in

1b and 1c, we assume that the central points in group-1 and

group-2 are labeled as positive and negative respectively.

Most neighbor points in group-1 are relevant to the query

except the strange point is irrelevant, while most neighbor

points in group-2 are irrelevant to the query except the

strange point is relevant. Based on the operation principle of

MR, the strange point in group-1 (query irrelevant) is

wrongly ranked in front of the strange point in group-2

(query relevant). We refer to this scenario as “dislocation”

problem.

In this paper, we propose a Two-Step Similarity Ranking

scheme, TS2R for short, in order to address the challenge

mentioned above. Compared with the short-term MR

methods, such as [3], both online and historical feedback

experiences are exploited in TS2R to elevate the retrieval

effectiveness. In contrast to the long-term MR methods, such

In this paper, a “strange” point refers to the data point with

different semantic comparing with its neighbor points. For

example, the points labeled as “o” in Figure 1b and 1c.

2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming

DOI 10.1109/PAAP.2014.26

191

下载后可阅读完整内容，剩余5页未读，立即下载

Syndergaard

粉丝: 6

视觉与语义并重：图像检索的两步相似度排序

基于形状特征的图像相似度排序程序

CBIR图像检索技术：相似度计算与搜索结果呈现

基于内容的图像检索系统实现与bmp相似度排序功能

图像检索系统

图像检索VC++源代码

传统图像检索系统实现python

基于内容的图像检索相关资料

一种基于群稀疏特征选择的图像检索方法

VC＋＋源代码，难得的图像搜索算法，可以很好的用于基于内容的图像检索中.zip

VC＋＋源代码，难得的图像搜索算法，可以很好的用于基于内容的图像检索中.rar

最新资源