Shape Google: a computer vision approach to isometry invariant shape retrieval
Maks Ovsjanikov
ICME
Stanford University
maks@stanford.edu
Alexander M. Bronstein
Dept. of Computer Science
Technion
bron@cs.technion.ac.il
Michael M. Bronstein
Dept. of Computer Science
Technion
mbron@cs.technion.ac.il
Leonidas J. Guibas
Dept. of Computer Science
Stanford University
guibas@cs.stanford.edu
Abstract
Feature-based methods have recently gained popularity
in computer vision and pattern recognition communities, in
applications such as object recognition and image retrieval.
In this paper, we explore analogous approaches in the 3D
world applied to the problem of non-rigid shape search and
retrieval in large databases.
1. Introduction
Large databases of 3D models available in the public do-
main have created the demand for shape search and retrieval
algorithms capable of finding similar shapes in the same
way a search engine responds to text queries. Since many
shapes manifest rich variability, shape retrieval is often re-
quired to be invariant to different classes of transformations
and shape variations. One of the most challenging settings
is the case of non-rigid or deformable shapes, in which the
class of transformations may be very wide due to the capa-
bility of such shapes to bend and assume different forms.
An analogous problem in the image domain is image
retrieval, the problem of finding images depicting similar
scenes or objects. Images, as well as three-dimensional
shapes, may manifest significant variability and the big
challenge is to create retrieval techniques that would be in-
sensitive to such changes, at the same time providing suffi-
cient discrimination power to distinguish between different
shapes. In the computer vision and pattern recognition com-
munities, feature-based methods have recently gained pop-
ularity with the introduction of the scale invariant feature
transform (SIFT) [12] and similar algorithms [14, 1]. The
ability of these methods to demonstrate sufficiently good
performance in many problems such as object recognition
and image retrieval and the public availability of code made
SIFT-like approaches a commodity and de facto standard.
One of the advantages of feature-based approaches in
image retrieval problems is that they allow to think of im-
ages as a collection of primitive elements (visual “words”),
and hence use the well-developed methods from text search.
One of the best implementations that manifest the use of
these ideas is Video Google,
1
a web application for object
search in large collection of images and videos developed in
Oxford university by Zisserman and collaborators [28, 6],
named this way appealing to the analogy with the famous
text search engine. Video Google makes use of feature de-
tectors and descriptors to represent an image as a collection
of visual words indexed in a “visual vocabulary.” Count-
ing the frequency of the visual word occurrence in the im-
age, a representation referred to as “bag of features” is con-
structed. Images containing similar visual information tend
to have similar bags of features, and thus comparing bags
of features allows one to retrieve similar images. Such a
method is suitable for indexing and searching very large
(Internet-scale) databases of images.
While very popular in computer vision, feature-based
approaches are less known and used in the shape analy-
sis community. The first reason is the lack of efficient
and robust feature descriptors similar to SIFT to be so
ubiquitously adopted. One of the important properties of
SIFT is its discrimination power combined with robust-
ness to different image transformations. While several
works proposed feature-based approaches for rigid shapes
[20, 10, 13, 7, 9], very few are capable of dealing with non-
rigid shape deformations [18, 22, 3, 32]. Secondly, shapes
are usually poorer in features compared to images, and thus
descriptors are less discriminative.
In this paper, we bring the spirit of feature-based com-
1
The Oxford Video Google project is not affiliated with the company
Google, Inc.
320
2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops
978-1-4244-4441-0/09/$25.00 ©2009 IEEE