Model Globally, Match Locally: Efficient and Robust 3D Object Recognition
Bertram Drost
1
, Markus Ulrich
1
, Nassir Navab
2
, Slobodan Ilic
2
1
MVTec Software GmbH
Neherstraße 1, 81675 Munich, Germany
2
Department of Computer Science, Technical University of Munich (TUM)
Boltzmannstraße 3, 85748 Garching, Germany
{drost,ulrich}@mvtec.com, navab@cs.tum.edu, Slobodan.Ilic@in.tum.de
Abstract
This paper addresses the problem of recognizing free-
form 3D objects in point clouds. Compared to traditional
approaches based on point descriptors, which depend on lo-
cal information around points, we propose a novel method
that creates a global model description based on oriented
point pair features and matches that model locally using a
fast voting scheme. The global model description consists
of all model point pair features and represents a mapping
from the point pair feature space to the model, where simi-
lar features on the model are grouped together. Such repre-
sentation allows using much sparser object and scene point
clouds, resulting in very fast performance. Recognition is
done locally using an efficient voting scheme on a reduced
two-dimensional search space.
We demonstrate the efficiency of our approach and show
its high recognition performance in the case of noise, clut-
ter and partial occlusions. Compared to state of the art ap-
proaches we achieve better recognition rates, and demon-
strate that with a slight or even no sacrifice of the recogni-
tion performance our method is much faster then the current
state of the art approaches.
1. Introduction
The recognition of free-form objects in 3D data obtained
by different sensors, such as laser scans, TOF cameras and
stereo systems, has been widely studied in computer vi-
sion [2, 9, 12]. Global approaches [8, 13, 14, 18, 23, 25]
are typically neither very precise nor fast, and are limited
mainly to the classification and recognition of objects of
certain type. By contrast, local approaches that are based
on local invariant features [1, 4, 5, 6, 7, 10, 15, 17, 19, 20,
21, 24] became extremely popular and proved to be quite
efficient. However, defining local invariant features heavily
Figure 1. Example of two partly occluded instances of an object
found in a noisy, cluttered scene. The matched objects are shown
as red and green wireframe and might not be recognizeable in B/W
copies.
depends on local surface information that is directly related
to the quality and resolution of the acquired and model data.
In contrast to the approaches outlined above we propose
a method that creates a global model description using an
oriented point pair feature and matches it by using a fast
voting scheme. The point pair feature describes the rela-
tive position and orientation of two oriented points as de-
scribed in Sec. 3.1. The global model description consists
of all model point pair features and represents a mapping
from the feature space to the model, where similar features
on the model are grouped together. Such a representation
provides a global distribution of all point pair features on
the model surface. Compared to the local methods, which
require dense local information, our approach allows the
model and the scene data to be represented only by a sparse
set of oriented points that can easily be computed from the
input data. Using sparse data also allows for an important
increase in the recognition speed, without significant de-
crease in the recognition rate. A fast voting scheme, similar