6 M. Weinmann
gations show that several tasks can also be solved without using such features. The
automatic and accurate alignment of captured point clouds, for instance, is an im-
portant task for digitization, reconstruction and interpretation of 3D scenes. Active
sensors such as terrestrial laser scanners can cope with measuring the 3D distance
of scene points and simultaneously capturing image information in form of either
co-registered camera images or panoramic reflectance images representing the re-
spective energy of the backscattered laser light. The recorded 3D point clouds typi-
cally provide a high point density as well as a high measurement accuracy. Hence,
the registration of two partially overlapping scans can be carried out based on the
3D geometry alone and thus without the need of visual features if the 3D structure
of the scene is distinctive enough.
Considering the example of point cloud registration, standard approaches such
as the Iterative Closest Point (ICP) algorithm [18, 117]orLeast Squares 3D Sur-
face Matching (LS3D)[53] only exploit spatial 3D information. Whereas the ICP
algorithm iteratively minimizes the difference between two point clouds, the LS3D
approach minimizes the distance between matched surfaces. Other approaches fo-
cus on the distribution of the points on 2D scan slices [22]orin3D[86]. For
environments with regular surfaces, various types of geometric primitives such as
planes [22, 106, 139] or more complex geometric features like spheres, cylinders or
tori [109] have been proposed. In scenes without regular surfaces, the registration
can rely on descriptors representing local surface patches which may, for instance,
be derived from geometric curvature or normal vectors of the local surface [8].
Several investigations, however, have shown that the registration of point clouds
can efficiently be supported by involving visual features derived from 2D imagery.
As both range and intensity information are typically measured on a regular scan
grid resulting from a cylindrical or spherical projection, they can be represented
as images. From these images, distinctive feature points can be extracted and re-
liable feature point correspondences between the images of different scans can
be derived. The extraction of such features has been proposed from range images
[11, 127], from intensity images [20, 66, 140] and from co-registered camera im-
ages [4, 10, 17]. In general, features in the intensity images provide a higher level
of distinctiveness than features in the respective range images [122] and, proba-
bly, information not yet represented in the range measurements. Projecting the in-
formation of distinctive 2D points to 3D space according to the respective range
information yields sparse point clouds describing physically almost identical 3D
points. The point cloud registration may then exploit the reliable 3D/3D correspon-
dences [122] or 3D/2D correspondences [143, 144] between different scans which
typically involves a RANSAC-based scheme [42]. Thus, the reduction to sparse
point clouds significantly reduces the time effort and even tends to improve the accu-
racy of the registration results as the amount and influence of outliers can be reduced
[140, 143, 144]. Furthermore, these approaches can directly be transferred to Time-
of-Flight cameras or devices based on the use of structured light (e.g., Microsoft
Kinect). Consequently, most of the current approaches addressing point cloud reg-
istration consider both range and intensity information for reaching an increased
performance, although the alignment can also be carried out without using visual
features if the scene provides a sufficiently distinctive 3D structure.