Sketch-Based Image Retrieval via Adaptive Weighting
Shuai Ren, Cheng Jin, Chang Sun, Yuejie Zhang
School of Computer Science
Shanghai Key Laboratory of Intelligent Information Processing
Fudan University, Shanghai, China
{11210240056, jc, 12210240067, yjzhang}@fudan.edu.cn
ABSTRACT
As touch devices become more and more popular these days,
it would be convenient if the user could draw a sketch and
then use the sketch as the input for an image ret rieval sys-
tem. Although Sketch-Based Image Retrieval (SBIR) had
been studied since 1990s, how to measure the similarity be-
tween a sketch and an image with high precision is still a
challenging problem. In this paper, a novel adaptive weight-
ing method is proposed for the matching process of SBIR.
We integrate a cost aggregation step into the matching pro-
cess, both the neighborhood and multi-scale information are
taken into account. The experiments on the public image
dataset show that our method can yield promising results.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Query for-
mulation, Search process
General Terms
Algorithms, Experimentation
Keywords
Sketch, adaptive weighting, image retrieval, contour
1. INTRODUCTION
Over the last decade, there has been an explosive growth
of image data available both online and offline. This trend
has shown the increasing need to support more effective im-
age retrieval. Text-based search technology can be used to
retrieve tagged images, but not all the images are originally
tagged, and sometimes it is hard to rep resent a visual ap-
pea rance accurately by a word or two. Many image search
engines provide similar image search, the user could upload
a guide image as the input and get the corresponding simi-
lar images from the image dataset. This is very meaningful,
however sometimes the user does not have the g uide image.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
ICMR ’14, April 01 - 04 2014, Glasgow, United Kingdom
Copyright 2014 ACM 978-1-4503-2782-4/14/04 $15.00.
Thus it would be very attractive if the user can draw some
sketches with their touch devices and then use the sketch as
the input. Sketch-Based Image Retrieval (SBIR) can deal
with the situations when the user provides an input sketch
and retrieves the corresponding images. In this paper, a
sketch i s a binary image with several black strokes on it and
those strokes are used to represent the objects in the g u ide
image.
One of the core problem for a SBIR system is to mea-
sure the similarity between a sketch and an image. As the
sketch itse lf c a n be treated as a special kind of image, ex-
isting descriptors which have been already used in Content-
Based Image Retrieval (CBIR) like Color Histogram, Scale-
invariant Feature Transform (SIFT), Edge Histogram De-
scriptor (EHD) and others can also be used in SBIR [4].
However, due to the lack of t exture and color information in
a sketch, some descriptors can not yield very good results.
Thus a lot of s tudi es have been focussed on how to choose a
good descriptor [3][5]. On the other hand, to build a practi-
cal search engine which deals with large number of images,
efficient indexing met hods are often required. Inspired by
those text retrieval methods, inverted indexing and Bag of
Visual Words (BoVW) model are often used to speed up the
retrieval process [2].
Eitz et al. proposed the Tensor descriptor for SBIR [5]. In
their work, an image was first divided into several subc el ls.
Then they used the Tensor descriptor to rep re sent the main
direction of a cell. The distance between two Tensor descrip-
tors was calculated by using Frobenius Norm. Ch ale chale et
al. proposed Angular Partition (ARP) which also divided
the image into several subcells [3]. Di ffere nt from the Tensor
descriptor, an orientation histogram was built for each cell,
and then the Fourier Transform was applied to the histogram
to achieve rotation invariance. A detailed evaluation of com-
mon descriptors like Shape Context, Histograms of oriented
gradients, Edge Histogram Descriptor, Angular Partitioning
and Tensor was discussed in [5]. To achieve fast retrieval,
Cao et al. proposed the contour-based EdgelIndex method
[2]. A triple (x, y, θ) was used to describe each edge point’s
p o sition and orientation. Th en an inverted index was built
for each triple, the matching cost between a sketch and an
image was defined by the number of edge points fallen into
each edge point’s surrounding region. The matching process
seemed similar to the BoVW model, which was fast due t o
the invert index structure. However, their result was rela-
tively poor without two-way calculation. Thus instead they
divided the sketch into several sub sketches based on the
user input sequences and then combined each sub sketches