Efficient Retrieval of Deformable Shape Classes
using Local Self-Similarities
Ken Chatfield
Dept. of Engineering Science
University of Oxford, UK
ken.chatfield@oriel.oxon.org
James Philbin
Dept. of Engineering Science
University of Oxford, UK
james@robots.ox.ac.uk
Andrew Zisserman
Dept. of Engineering Science
University of Oxford, UK
az@robots.ox.ac.uk
Abstract
We present an efficient object retrieval system based on
the identification of abstract deformable ‘shape’ classes
using the self-similarity descriptor of Shechtman and
Irani [13]. Given a user-specified query object, we retrieve
other images which share a common ‘shape’ even if their
appearance differs greatly in terms of colour, texture, edges
and other common photometric properties.
In order to use the self-similarity descriptor for effi-
cient retrieval we make three contributions: (i) we spar-
sify the descriptor points by locating discriminative regions
within each image, thus reducing the computational expense
of shape matching; (ii) we extend [13] to enable match-
ing despite changes in scale; and (iii) we show that vec-
tor quantizing the descriptor does not inhibit performance,
thus providing the basis of a large-scale shape-based re-
trieval system using a bag-of-visual-words approach. Per-
formance is demonstrated on the challenging ETHZ de-
formable shape dataset and a full episode from the televi-
sion series Lost, and is shown to be superior to appearance-
based approaches for matching non-rigid shape classes.
1. Introduction
We are interested in the rapid and accurate retrieval of
objects based on their shape from large unordered collec-
tions of images and videos. Our aim is to accurately re-
trieve these objects despite deformations caused by intra-
class variations or non-rigid materials. An example of the
kinds of images we would like to handle is shown in fig-
ure 1. These images share almost none of the usual pho-
tometric properties such as colour, texture or edges and yet
clearly share similarities in shape, as defined by a common
configuration of repeating pattern elements. The ability to
match these generic shapes can be considered an important
sub-task for object class recognition. Due to the lack of any
shared appearance, descriptors such as SIFT [8], which use
Figure 1: Challenges of object class identification. Although
all four images are of a heart, there is no obvious image property
(e.g. texture, edges or colour) shared between them.
intensity gradients, are not appropriate, and indeed (as will
be shown) often perform poorly in such cases.
The problem of matching such shapes is addressed by
the descriptor of Shechtman and Irani [13], which uses lo-
cal self-similarity patterns extracted from the image as a
descriptor (reviewed in section 3). However, their work
concentrates on matching templates at similar scales and
does not address the problems of false positive matches
or retrieval in large datasets. The questions we investigate
here are: (i) is the descriptor invariant to changes in scale
and (ii) can it be applied to efficient large scale image re-
trieval [6, 9, 11, 15] by vector quantizing into visual words?
Here, we will show that both questions can be answered
positively: in section 4 it is shown that self-similarity de-
scriptors are somewhat unaffected by scale change and a
certain degree of deformation, and are sufficient to support
multi-scale shape matching. Further, we show that intra-
image matching can be used to obtain discriminative de-
scriptors and reduce their density, and in section 5 it is
shown that matching performance is not adversely affected
by quantizing the descriptors into visual words.
Global shape deformations are modelled using the Im-
plicit Shape Model of Leibe et al [7]. This model has been
shown to be sufficiently invariant to the kind of deforma-
tions that occur within disparate object classes such as cars,
cows, horses and pedestrians [10, 14]. The vector quantiza-
tion that we introduce lays the foundation for efficient re-
trieval under deformation and rendering changes – such as
matching from an image to a line drawing for example.