Fast and Robust Multi-View 3D Object Recognition in Point Clouds
Guan Pang
University of Southern California
gpang@usc.edu
Ulrich Neumann
University of Southern California
uneumann@usc.edu
Abstract
Recognition of three dimensional (3D) objects in point
clouds is a challenging problem. Existing methods often
require prior segmentation or 3D descriptor training and
matching, both time consuming and complex processes,
especially for large-scale industrial or urban street data.
We describe a new recognition approach that projects a
3D point cloud into several 2D depth images from multiple
viewpoints, transforming the 3D recognition problem into
a series of 2D detection problems. This method reduces
complexity, stabilizes performance, and significantly speeds
up the recognition process, without any requirement for
object segmentation or detector training. Experiments
validate the superiority of our method over several state-
of-the-art methods on examples from industrial and street
data scans.
1. Introduction
3D object recognition (fig. 1) in point clouds is a chal-
lenging problem due to discrete sampling, occlusions and
cluttered scenes. Many existing methods focus on small-
scale data [1, 2, 3, 4, 5, 6, 7, 8] using 3D descriptors. A
few others work with large-scale data, mostly urban street
scans [9, 10, 12, 13, 14, 15]. These methods utilize machine
learning to select the best description for a specific type
of 3D object, so they can be recognized reliably in a large
urban scene, and usually require prior segmentation of input
data. Relatively fewer take on industrial part recognition[14,
15], where objects are often more densely arranged, making
segmentation more difficult. Regardless of domain focus,
most methods perform the recognition process in 3D, either
using 3D local descriptors [1, 9, 16, 2, 3, 4] or exhaustive3D
scanning-window search [14, 17]. Both approaches require
3D descriptor or detector training and are time-consuming
due to the 3-dimensional search. Large-scale industrial
or street data contain 100’s of millions or billions of 3D
points, motivating the search for fast and robust recognition
methods.
Two recent trends motivate our work. Growing avail-
Figure 1. Object recognition from 3D point cloud.
ability and use of 3D scanners has spurred interest in 3D
object recognition. Also, 2D object detection in images
has improved dramatically. These observations motivate a
transformation of the 3D object recognition problem into a
series of 2D detection problems. This 3D-to-2D strategy
is similar to those used for 3D object model retrieval [20,
21, 23], but our target is unsegmented noisy large-scale 3D
point cloud which is much more complex. Our algorithm
for 3D object recognition is based on multi-view projection,
first projecting a 3D point cloud into 2D depth images from
multiple viewpoints. Objects are detected in each view
using gradient data, and the 2D detection results are fused
by 3D re-projection to determine object locations. This
algorithm reduces the search complexity from 3D to 2D,
while removing all requirements for object segmentation
or detector training. The multi-view projection process
also stabilizes performance in cluttered and occluded scenes
and provides rotation invariance. Our method is tested on
a combination of industrial data and street data [25, 13]
containing various types of objects and scene conditions. In
comparisons with state-of-the-art 3D recognition methods,
our method has competitive overall performance with one-
order of magnitude speed-up.
Our main contributions include:
• Transforming the 3D point cloud object recognition
problem into a series of 2D detection problems to
reduce search complexity.
• Employing multi-view projection to provide rotation
invariance and stabilize performance in cluttered and
occluded scenes.
1