IMAGE RETRIEVAL WITH HIERARCHICAL MATCHING PURSUIT
Shasha Bu, Yu-Jin Zhang
Department of Electronic Engineering, Tsinghua University
Tsinghua National Laboratory for Information Science and Technology
Beijing 100084, China
Email: boss12@mails.tsinghua.edu.cn, zhang-yj@mail.tsinghua.edu.cn
ABSTRACT
A novel representation of images for image retrieval is in-
troduced in this paper, by using a new type of feature with
remarkable discriminative power. Despite the multi-scale na-
ture of objects, most existing models perform feature extrac-
tion on a fixed scale, which will inevitably degrade the perfor-
mance of the whole system. Motivated by this, we introduce
a hierarchical sparse coding architecture for image retrieval
to explore multi-scale cues. Sparse codes extracted on low-
er layers are transmitted to higher layers recursively. With
this mechanism, cues from different scales are fused. Experi-
ments on the Holidays dataset show that the proposed method
achieves an excellent retrieval performance with a small code
length.
Index Terms— CBIR, sparse coding, hierarchical match-
ing pursuit, bag-of-features
1. INTRODUCTION
Image retrieval has been increasingly popular in recent years.
Searching images such as pictures of a scenic spot or an ani-
mal has become a part of everyday life for many people, either
from the internet or database in hand. However, with image
database growing increasingly larger, how to find the intend-
ed images from so many images is a problem presented in
image retrieval. A lot of works have been done in this field
[1][2][3][4][5].
Recent works on image retrieval mainly concentrate on
content based image retrieval (CBIR). Features from images
are extracted and compared for similarity measurement based
on which the most similar images to the query are returned.
Bag-of-features (BoF) model [6] is extensively used in
CBIR which often obtains good performance. Methods fol-
lowing such a framework often use Scale-invariant feature
transform (SIFT) [7], which is robust against many image
transformations. However, the vector quantization (VQ) [8]
This work was supported by National Nature Science Foundation (NNS-
F: 61171118) and Specialized Research Fund for the Doctoral Program of
Higher Education (SRFDP-20110002110057).
in BoF model only assumes that each feature is related to a
single visual word, and thus ignores the correlation between
the feature and other words. What is more, SIFT is a local
feature which is unable to capture the global cues. And fea-
tures of the same image are irrelevant to each other, limiting
the fusion of cues between them. Sparse coding techniques
and global features have been proposed to fix the problem
[9][10][11][12][13][14][15]. Nevertheless, neither utilizing
one-layer sparse coding nor leveraging global feature on a
fixed scope can cues of different scales be adequately ex-
plored. The success of hierarchical matching pursuit (HM-
P) algorithm in classification [16] motivates us to employ the
hierarchical sparse coding architecture in image retrieval to
explore multi-scale cues.
A global feature using HMP is introduced in this paper for
image retrieval, which has not been considered in this field to
our knowledge. The global cues as well as features on differ-
ent scales are extracted, forming a sparse representation. Im-
ages are first partitioned into patches of different sizes. Then,
sparse codes are extracted from smaller patches and spatially
pooled on larger patches recursively. Finally, a hierarchical
sparse coding architecture is constructed, and sparse repre-
sentations extracted from the hierarchical layers are adopted
for retrieval. Experiments conducted on the Holidays dataset
[17] demonstrate the effectiveness of the proposed approach,
where excellent performance compared with prior methods is
obtained.
2. SPARSE CODING IN CBIR
This section presents the procedure of utilizing sparse cod-
ing for CBIR. A standard sparse coding model can be for-
mulated as follows. Given an over completed codebook C
(C ∈ R
D×K
) and a basic feature y (y ∈ R
D
), a vector x
(x ∈ R
K
) with sparsity L is generated to approximate y [11]
as
min
x
ky − Cxk
2
, s.t.kxk
0
≤ L.
(1)
Orthogonal matching pursuit (OMP) [16] is usually employed
to solve Eq. (1).