Large Scale Image Retrieval Based on Adaptive
Dense-SIFT
Qiaopeng Han, Li Zhuo, Haixia Long
Signal & Information Processing Laboratory
Beijing University of Technology
Beijing, China
s201302102@emails.bjut.edu.cn
Abstract—In this paper, firstly, an adaptive Dense-SIFT
feature extraction method is proposed, which can adaptively
adjust the size of local window using the edge information of
image. Next, a large scale image retrieval method is proposed.
The adaptive Dense-SIFT features are extracted from the
database images. Bag of Word (BoW) model is then adopted to
create the corresponding histograms of visual words frequency to
represent the features. To efficiently describe the image content,
the feature vectors are constructed by combining the visual
words histograms of Dense-SIFT feature with the 72-dimensional
HSV (Hue, Saturation, Value) color feature. In retrieval process,
the top-h most similar images are returned by computing the
similarity between the feature vectors of querying image and
those of the images in database. Finally, to further improve the
accuracy, the returned images are re-ranked with context
similarity information. The experimental results on Corel-5K
and Oxford Buildings dataset show that the proposed method
outperforms the existing image retrieval methods.
Keywords—image retrieval; adaptive Dense-SIFT; visual words;
re-ranking
I. INTRODUCTION
This Large scale image retrieval has been become a hot
research topic in multimedia retrieval community, in which
Content Based Image Retrieval (CBIR) is the most popular
retrieval method. CBIR utilizes the features to represent the
image content, and determines the similarity between images
by comparing the similarity of the features. The key parts of
CBIR contain feature extraction, similarity matching, etc.
Moreover, to further improve the retrieval accuracy, re-ranking
technique has been proposed.
Scale Invariant Feature Transform (SIFT) feature has been
proven to be the most representative extraction algorithm of
local feature, which has been widely used in some domain,
such as image retrieval and image classification, due to its
strong robustness. Timothee [1] uses SIFT as a local feature to
construct the index based on the vocabulary tree. And the
image retrieval scheme is achieved by the index structure
which is improved with contextual weighting of the local
features. However, SIFT can only represent the details of
images by using gray scale information, and it contains no
other visual characteristics. Thus, to solve this problem, a
coupled Multi-Index (c-MI) framework [2] has been proposed.
to perform feature fusion at indexing level, which takes each
of the SIFT and Colour Names features as one dimension of
multi-index to form the feature vectors. It can describe the
image well both in details and global perspectives. The
method can achieve better performance in image retrieval. To
get more accurate results, re-ranking [3] technique has
been introduced into the field of image retrieval. Yang [4]
presents a new prototype-based re-ranking method based on
SIFT, which utilizes a re-ranking model as prior knowledge.
This model is learned offline from user-labeled training data.
Although the built system enhances retrieval performance to
some extent, the main restriction is that the accuracy of the
classifier cannot be guaranteed, due to the limited number of
user-labeled images. This shortcoming can be conquered by a
context-sensitive similarity re-ranking method [5], which
returns an initial search result based on SIFT and re-ranking
the returned result by using shortest path method [6]. The
advantage of this method is that there is no need to train a
classifier, so that the retrieval accuracy and efficiency can be
improved.
The aforementioned methods adopt SIFT as the local
feature. Although SIFT shows superior performance in image
retrieval, image classification and other application areas, the
process of extraction has a high computational complexity. To
overcome this drawback, a variety of fast SIFT algorithm have
been proposed, such as SpeedUp Robust Features (SURF),
Dense-SIFT, etc. In the extraction process of Dense-SIFT, a
fixed window size is employed to traverse the image. There is
no key-point detection stage while local feature descriptors are
extracted at each patch. Since the texture information in
different image areas is not the same, using a fixed window
size will lead to either insufficient extraction in texture
complex areas or overdone extraction in smooth areas.
To solve the problem during the extraction process of
Dense-SIFT feature, we propose an adaptive Dense-SIFT
features extraction method. The size of window is adjusted
adaptively based on the edge information of images. Then, a
large scale image retrieval method is proposed. BoW model is
exploited to represent the local features, then associating with
HSV colour feature to construct the feature vectors to
represent the image content. In retrieval process, the similarity
between the feature vectors of querying image and those of
database images is computed and then the initial search results
____________________________________
978-1-4673-9088-0/15/$31.00 ©2015 IEEE