相似度矩阵池化多索引融合图像检索

109 浏览量更新于2024-08-31 收藏 380KB PDF 举报

"Multi-Index Fusion via Similarity Matrix Pooling for Image Retrieval" 本文是一篇关于图像检索的研究论文，由来自 Tongji University、University of Science and Technology of China 和 University of Texas at San Antonio 的研究人员共同撰写。文章的核心是提出了一种基于相似度矩阵池化的多索引融合方法，用于提高图像检索的性能。在图像检索领域，不同的特征类型具有各自的优势，可以互相补充。论文的创新点在于提出了一种在索引层面上进行多特征融合的新策略。首先，计算每个索引的相似度矩阵，然后利用一种新颖的池化方法对这些相似度矩阵进行聚合，以此更新原始索引。这种方法与现有的融合方案相比，优点在于它在索引级别进行融合，节省了内存并降低了计算复杂性。此外，该方法还根据特征的重要程度自适应地处理不同类型的特征，从而提高了检索的准确性。通过在两个公开数据集上的实验评估，所提出的方案显著优于基线方法，验证了其有效性和优越性。论文的主要贡献总结如下： 1. 提出了一种新的多索引融合方法，利用相似度矩阵池化技术在索引级别进行特征融合，减少了计算资源的需求。 2. 通过自适应地考虑不同特征的重要性，优化了特征融合过程，提升了检索结果的精度。 3. 实验结果表明，新方法在两个公共数据集上均表现出色，对比传统方法有显著的性能提升，进一步证明了该方法的实用价值。这一研究对于改进图像检索系统的效率和准确度具有重要意义，为未来相关领域的研究提供了新的思路和方法。

Multi-Index Fusion via Similarity Matrix Pooling

for Image Retrieval

Xin Chen

∗

, Jun Wu

∗

, Shaoyan Sun

†

,QiTian

‡

∗

Tongji University, Shanghai, China

†

University of Science and Technology of China, Hefei, China

‡

University of Texas at San Antonio, Texas, TX 78249

{1410452, wujun}@tongji.edu.cn, sunshy@mail.ustc.edu.cn, qi.tian@utsa.edu

Abstract—Different kinds of features hold some distinct merits,

making them complementary to each other. Inspired by this

idea an index level multiple feature fusion scheme via similarity

matrix pooling is proposed in this paper. We ﬁrst compute the

similarity matrix of each index, and then a novel scheme is used

to pool on these similarity matrices for updating the original

indices. Compared with the existing fusion schemes, the proposed

scheme performs feature fusion at index level to save memory

and reduce computational complexity. On the other hand, the

proposed scheme treats different kinds of features adaptively

based on its importance, thus improves retrieval accuracy. The

performance of the proposed approach is evaluated using two

public datasets, which signiﬁcantly outperforms the baseline

methods in retrieval accuracy with low memory consumption

and computational complexity.

I. INTRODUCTION

With the explosive increase of visual data in recent years,

image retrieval has become an urgent need to ﬁnd useful infor-

mation in massive visual data. Content-based image retrieval

(CBIR) is such a fabulous way. Typically, a CBIR system

represents an image as a vector with ﬁxed dimension and

measures the similarity between two images by computing

the Euclidean or cosine distance between these two vectors.

The vector may be a holistic global feature vector or a sparse

histogram vector constructed by local features. Different types

of features have different representative power, resulting in

different performance of CBIR systems.

Early CBIR systems usually use global holistic low di-

mension feature vectors and deal with small scale datasets,

requiring relatively low memory consumption and computa-

tional complexity, thus developing efﬁcient index schemes is

not necessary. With the invention of Scale-Invariant Feature

Transform (SIFT) [1], methods for image representation be-

come much more complex and the scale of image datasets also

grows, traditional methods for similarity measure show their

limitations.

Inspired by the successful text retrieval system, the inverted

index structure and Bag-of-Visual-Words (BoVW) model are

introduced into CBIR systems for efﬁcient image retrieval [2].

In this framework, local features extracted from an image

are quantized into different visual words of a pre-trained

codebook (bag). Then the quantized features are weighted with

Term Frequency-Inverse Document Frequency (TF-IDF) [3],

generating a histogram vector used for image representation.

The similarity between a query image and dataset images is

measured by counting the co-occurrence of the same visual

words for a pair of query and dataset images. The database is

organized according to visual words, each visual word consists

of multiple image entries, each image entry corresponds to

an identity (ID) of the image and its TF-IDF weight. In the

online query stage, we only need to traverse those lists of

visual words appearing in the query image. In this way, both

the memory consumption for the storage of index and the

computational complexity for the online retrieval are greatly

reduced. These advantages make this framework mainstream

of content-based image retrieval for a decade. A considerable

number of works focus on further improving the retrieval

accuracy and efﬁciency upon this framework [19], [20], [21],

[22], [23], [24], [25].

Computer vision ﬁeld witnesses revolutionary changes

caused by deep neural network. Especially, deep convolu-

tional neural networks (CNN) has improved a considerable

amount of vision tasks to a new state-of-the-art performance

[11], [26], [27], [28]. The powerful discriminative ability of

CNN feature has been widely explored in the task of image

retrieval. Babenko et al. [4] proposed to use one layer of

CNN activations as image representation for image retrieval.

Hariharan et al. [10] exploited the spatial information to

aggregate the feature maps of the same location in a speciﬁc

convolutional layer into a hypercolumn vector, which was used

in the task of object segmentation. Furthermore, Ng et al.

[6] proposed to aggregate these hypercolumn vectors into one

vector with Vector of Locally Aggregated Descriptors (VLAD)

[17] as image representation for image retrieval. A multi-scale

orderless pooling scheme [5] was proposed to aggregate CNN

features from images of multiple scales with VLAD. These

schemes signiﬁcantly boost the image retrieval accuracy.

Local features such as SIFT have powerful representative

ability for detailed information of images but lack holistic

ability. In BoVW model, sufﬁcient details are kept in the index

without preserving the global spatial information. A lot of

schemes are proposed to aggregate spatial information into

the index [19], [20], [21]. Although these schemes do help to

boost the retrieval performance, they just integrate partial low

order spatial information, which is not good enough. On the

contrary, global features such as fully-connected layer CNN

are good at collecting global information of an image with

IEEE ICC 2017 SAC Symposium Big Data Networking Track

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38501206

粉丝: 6

相似度矩阵池化多索引融合图像检索

The Inverted Multi-Index.pptx

2019-CVPR-Multi-Similarity Loss with General Pair Weighting for

Evaluation-Metrics-for-Image-Fusion-master_evaluation_imagefusio

the function performs symmetric t-sne on pairwise similarity matrix p % to c

multi-scale structural similarity是什么

Feature Representation Learning for Unsupervised Cross-domain Image Retrieval

请把这篇文献《Accelerating Similarity-Based Model Matching Using On-The-Fly Similarity Preserving Hashing》翻译成中文

fsim: a feature similarity index for image quality assessment代码

最新资源