基于稀疏表示的图像检索：分离词汇与特征融合方法

11 浏览量更新于2024-08-31 1 收藏 951KB PDF 举报

本文主要探讨了"Separable Vocabulary and Feature Fusion for Image Retrieval based on Sparse Representation"这一主题，发表在《神经计算》(Neurocomputing)期刊上，该刊网址为www.elsevier.com/locate/neucom。文章关注的是图像检索领域中的关键方法，特别是基于稀疏表示的视觉词汇和特征融合技术。在传统的图像检索模型中，如Bag-of-Visual-Words (BOW)模型，视觉词汇是其核心要素。BOW模型依赖于将图像分解成局部特征，然后将这些特征聚类形成一个词汇表，以描述图像的整体特征。然而，问题在于，为了保证检索的准确性，传统方法倾向于使用较大的词汇表。这样做的缺点是，随着词汇表规模的增长，查询过程可能会变得复杂且计算量大，导致检索效率降低，同时可能会引入噪声，影响结果的精确性。文章提出了一种新颖的方法，即分离词汇（Separable Vocabulary）和特征融合（Feature Fusion）。通过将大的词汇表分解为更小、更易于处理的部分，这种分离词汇策略可以减少计算负担，同时保持一定程度的表达能力。特征融合则是在不同的特征子集之间集成信息，以便综合多个视角来提高图像的描述精度，从而改善检索性能。作者团队来自北京交通大学、北京的关键信息科学与网络技术实验室、中央民族大学信息工程学院、俄罗斯顿河畔罗斯托夫国立技术大学无线电电子系统系以及北京建筑大学科学院，他们共同研究了如何通过优化这两个方面来提高图像检索的准确性和效率。研究的关键点包括但不限于：设计有效的算法来构建和管理分离词汇，探索不同特征选择和融合策略，以及如何在实际应用中平衡词汇大小、计算复杂度和检索质量的关系。此外，文中可能还涵盖了实验设计，展示了使用这种方法在公开数据集上的性能对比，以及与传统方法的比较分析。这篇论文对于那些关注图像检索技术改进，尤其是在稀疏表示理论背景下寻求更高效、更精确检索方法的研究者来说，具有重要的参考价值。它提供了在保持高检索精度的同时，通过创新的词汇管理和特征融合策略来优化图像检索性能的新思路。

Contents lists available at ScienceDirect

Neurocomputing

journa l homepa ge: www.elsevier.com/locate/neucom

Separable vocabulary and feature fusion for image retrieval based on sparse

representation

Yanhong Wang

a,b

, Yigang Cen

a,b,

⁎

, Ruizhen Zhao

a,b

, Yi Cen

, Shaohai Hu

a,b

, Viacheslav Voronin

Hengyou Wang

Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China

Key Laboratory of Advanced Information Science and Network Technology of Beijing, Beijing 100044, China

School of Information Engineering, Minzu University of China, Beijing 100081, China

Department of Radio-electronic Systems, Don State Technical University, Shakhty 346500, Russia

School of Science, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

ARTICLE INFO

Keywords:

Separable vocabulary

Sparse representation

Feature fusion

Image retrieval

ABSTRACT

Visual vocabulary is the core of the Bag-of-visual-words (BOW) model in image retrieval. In order to ensure the

retrieval accuracy, a large vocabulary is always used in traditional methods. However, a large vocabulary will

lead to a low recall. In order to improve recall, vocabularies with medium sizes are proposed, but they will lead

to a low accuracy. To address these two problems, we propose a new method for image retrieval based on feature

fusion and sparse representation over separable vocabulary. Firstly, a large vocabulary is generated on the

training dataset. Secondly, the vocabulary is separated into a number of vocabularies with medium sizes.

Thirdly, for a given query image, we adopt sparse representation to select a vocabulary for retrieval. In the

proposed method, the large vocabulary can guarantee a relatively high accuracy, while the vocabularies with

medium sizes are responsible for high recall. Also, in order to reduce quantization error and improve recall,

sparse representation scheme is used for visual words quantization. Moreover, both the local features and the

global features are fused to improve the recall. Our proposed method is evaluated on two benchmark datasets,

i.e., Coil20 and Holidays. Experiments show that our proposed method achieves good performance.

1. Introduction

In recent years, content-based image retrieval (CBIR) is a very hot

research issue of computer vision and multimedia information.

Although it has achieved rapid development, researchers have not yet

to standardize various image retrieval systems [1]. Image retrieval still

remains as a challenging problem. It is the fact that eﬀects of image

retrieval are failed due to occlusion, distortion, corrosion and the

diﬀerent lighting conditions.

Image retrieval means that, for a given query image, we will retrieve

all the similar images from the database. Similar images are deﬁned as

images contain the same objects or a scene viewed under diﬀerent

imaging conditions [2]. In the past years, the BOW model [3,4] has

achieved great eﬀect in image retrieval area. This model is inspired by

the text retrieval system [3–5]. It contains four major steps: (1). Local

features are extracted from each image, such as the SIFT descriptor [6],

rootSIFT descriptor [7] and SURF descriptor [8] etc. (2). Each local

descriptor is quantized to a visual word according to a pre-trained

vocabulary by an unsupervised clustering approach. (3). Each image is

represented by a frequency histogram of visual words. (4). Retrieval

results are returned according to the similarities between the query

image and the images of dataset.

Vocabulary plays a very important role in the BOW model. For a

large number of local features, in order to ensure the retrieval accuracy,

we need to train a large visual vocabulary. But a large visual vocabulary

will lead to a low recall and other issues [9,10]. In order to improve the

recall, in previous works, there are two main types of solutions: Firstly,

the size of the vocabulary is changed. For examples, in [2], Jegou et al.

proposed to use the vocabulary with medium size to improve recall.

However, this will lead to a low accuracy [10].In[11,12], the author

represented images with vector of locally aggregated descriptors

(VLAD), which can be viewed as a simpliﬁcation of the ﬁsher vector

(FV) [13] representation. Moreover, the VLAD method only requires a

small vocabulary in the retrieval process. Secondly, multiple vocabul-

aries based strategies are used. The vocabularies are usually generated

by an independent training dataset. In [14], the author proposed a

Bayes merging approach to down-weight the indexed features in the

intersection set. In [15], instead of computing the multiple vocabul-

http://dx.doi.org/10.1016/j.neucom.2016.08.106

Received 27 February 2016; Received in revised form 17 July 2016; Accepted 8 August 2016

⁎

Corresponding author at: Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China.

E-mail address: ygcen@bjtu.edu.cn (Y. Cen).

Neurocomputing 236 (2017) 14–22

Available online 17 November 2016

MARK

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38686557

粉丝: 4
资源: 930

基于稀疏表示的图像检索：分离词汇与特征融合方法

DESIGN FOR EMBEDDED IMAGE PROCESSING ON FPGAS

Multispectral image fusion method based on intensity-hue-saturation and nonsubsampled three-channels non-separable wavelets

Axial-DeepLab

sklearn rbf

基于深度学习的图像分割有哪些论文

adaptive depthwise separable dilated convolution and multigrained cascade fo

sklearn dbscan

large-separable-kernel-attention

one class SVM

separable convolution

最新资源