Sketch-based Image Retrieval via Shape Words
Changcheng Xiao
1∗
, Changhu Wang
2†
, Liqing Zhang
1
, Lei Zhang
3
1
Shanghai Jiao Tong University, Shanghai, China
2
Microsoft Research, Beijing, China
3
Microsoft Corporation, Redmond, USA
xchangcheng@gmail.com, chw@microsoft.com,
zhang-lq@cs.sjtu.edu.cn, leizhang@microsoft.com
ABSTRACT
The explosive growth of touch screens has provided a good
platform for sketch-based image retrieval. However, most
previous works focused on low level descriptors of shapes and
sketches. In this paper, we try to step forward and propose
to leverage shape words descriptor for sketch-based image
retrieval. First, the shape words are defined and an efficient
algorithm is designed for shape words extraction. Then we
generalize the classic Chamfer Matching algorithm to ad-
dress the shape words matching problem. Finally, a novel
inverted index structure is proposed to make shape words
representation scalable to large scale image databases. Ex-
perimental results show that our method achieves competi-
tive accuracy but requires much less memory, e.g., less than
3% of memory storage of MindFinder. Due to its compet-
itive accuracy and low memory cost, our method can scale
up to much larger database.
Categories and Subject Descriptors
H.3.3 [Information Retrieval]: Search Process, Query for-
mulation
Keywords
Sketch-based Image Retrieval; Shape Words
1. INTRODUCTION
Owing to the popularity of digital cameras, millions of new
digital images are freely accessible online every day, which
brings a great opportunity for image retrieval. Usually, users
search images with text queries. But the shape and location
of the object are hard to be formulated with a few keywords.
Thus, query-by-example (QBE) was proposed. However, in
∗
Changcheng Xiao performed this work while being an in-
tern at Microsoft Research Asia.
†
Corresponding author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
ICMR’15, June 23–26, 2015, Shanghai, China.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3274-3/15/06 ...$15.00.
http://dx.doi.org/10.1145/2671188.2749360.
Figure 1: Example results of our system. Suppose
we want to search for a bike. We may draw two
circles as its wheels and draw some line segments as
its frame. When we draw these sketches the system
will extract shape words (line segments and circu-
lar arcs) and search similar images with these shape
words.
QBE, the query has to be an example image, which is usu-
ally the reason of searching. As the popularity of touch-
screen devices, searching images by drawing has become a
highly desired feature for users, which is complementary and
thus can be combined with query-by-keyword and query-by-
example modalities.
Sketch based image retrieval (SBIR) has been extensively
studied since 1990s, and stepped into large-scale scenarios
in recent years. In 2010, Eitz et al. [5] built an SBIR system
based on Tensor descriptors by linearly scanning the whole
database for each query, which greatly limits its scalability.
In 2011, Cao et al. [3] built the MindFinder system based on
indexable Oriented Chamfer Matching (OCM) to solve the
indexing problem of SBIR. In 2012, Zhou et al. [8] proposed
a convolution based descriptor, and Kai-Yu Tseng [7] pro-
posed to use ‘HashBits’ to compress the Distance Transform
descriptor. In 2013, Sun et al. [6] built a billion scale SBIR
system with vector-like Chamfer feature pairs.
However, most of these methods [3, 5, 6, 7, 8] focus on low
level descriptors of sketches like local patches or edge pixels,
which require huge amount of memory storage. In this work,
we try to go one step forward and see the shapes/sketches
in a higher view. Different from MindFinder [4] which in-
dexes and matches with (sampled) edge pixels, we propose
to use shape words to represent both the query sketch and
database images. As shown in Fig. 1, when asked to draw a