An Improved RGB-D SLAM Algorithm based on Kinect Sensor
Liang Zhang
1
,Peiyi Shen
2
, Jieqiong Ding
3
, Juan Song, Jingwen Liu, and Kang Yi
Abstract— This paper presents an improved frontend algo-
rithm of RGB-D SLAM, that is, Extend-RGBD-ICP algorithm.
After introducing the existing frontend algorithm and analyzing
its drawbacks, we present three improvement aspects: point
cloud data down-sampling, matching points’ selection and
elimination of cumulative error of consecutive frames splicing.
From the experiments on different complexities scenarios and
the evaluation on benchmark dataset, it can be seen that the
Extend-RGBD-ICP algorithm can achieve both accuracy and
efficiency.
I. INTRODUCTION
The SLAM (Simultaneous Localization and Mapping)[1]
is currently an important research field of mobile robots. In
recent years, with the continuous development of sensor tech-
nology and related theoretical algorithm, the map creation of
the environment around the robot has come into 3D age. In
the second half of 2010, Microsoft launched a 3D sensor
Kinect[2], which is able to simultaneously obtain color
information (RGB) and depth information (Depth) (together
referred as RGB-D information) of the environment. The
price and performance advantages make it widely studied
and applied in the robot RGB-D SLAM[3] fields.
Current RGB-D SLAM algorithms are divided into fron-
tend and backend[3]. Whereas the frontend extracts spatial
relations between individual observations, the back-end op-
timizes the poses of these observations in a so-called pose
graph with respect to a nonlinear error function. In the
frontend of RGB-D SLAM algorithm, the color information
provided by RGB-D sensor is used to do feature detection,
descriptor extraction and feature matching, then based on
these matching results and the depth information provided,
the 6-dimensional position and orientation transformation
relationship of adjacent sensors poses are computed. In the
back-end of RGB-D SLAM algorithm, a graph-based opti-
mization routine is used to compute globally optimal poses
for the sensor positions from results of the frontend, and then
three-dimensional map model and the camera trajectory are
computed.
This paper studies the frontend of RGB-D SLAM algorith-
m and presents some improvements on the basis of analysis
for the existing frontend algorithms.
The remainder of this paper is organized as follows. First-
ly, we briefly introduce the currently existing representative.
1
Liang Zhang is with Faculty of Software Engineering, Xidian University,
Xi’an, Shaanxi, China liangzhang@xidian.edu.cn
2
Peiyi Shen is with Faculty of Software Engineering, Xidian University,
Xi’an, Shaanxi, China pyshen@xidian.edu.cn
3
Jieqiong Ding was a master in major of Computer
technology, Xidian University. She is now working in Huawei
dingjieqiong900301@163.com
II. RELATED WORK
A. Current Situation of RGB-D SLAM Algorithm
Shortly after Kinect was released, the University of Wash-
ington and Microsoft lab[4], developed a real-time visual
SLAM system based on TORO pose graph optimization[5]
algorithm to build the 3D maps. The SLAM algorithm uses
SIFT (Scale-invariant feature transform)[6], [7] method to
obtain the initial estimate of current frame to the first frame.
In order to improve the initial estimates, ICP (Iteratively
Closest Point)[8] method is used to do the point cloud
matching. After detecting loop-closure, all the frames are put
into TORO pose graph for global optimization, which results
in a more accurate map. Freiburg University[9] in Germany
proposed a RGB-D SLAM algorithm similar to the method
of University of Washington, in order to improve real-time
quality, they use SURF[10] feature matching method and
Hog-man pose graph[11] for global optimization. Willow
Garage laboratory[12] developed a map creation method
which can not only optimize the camera poses but also
optimize the feature points in the scene based on ICP and
BA (Bundle adjustment)[13].
B. Current Situation of Frontend Algorithm
In the two-dimensional image sequence splicing process,
a common approach is to use feature point detection and
matching algorithm to determine the geometry relation be-
tween image sequences, thereby obtaining an image with a
larger viewing angle range. The advantage of this method
based on image matching is that it has a small amount of
computation by extracting sparse feature points in image
sequences to determine the motion estimation parameters.
However, there is a large error in feature descriptors match-
ing process, to solve this problem, RANSAC algorithm[14]
is used to determine the rigid transformation between the
matching feature points to obtain the correct motion estima-
tion parameters.
In three-dimensional point cloud data splicing process,
ICP[8] method is frequently used, which determines the
rigid transformation between two point cloud data to make
the deviation of the corresponding points minimum. It
was first proposed in 1990 to splice the object model
from depth data, and there are many ICP-based improve-
ments, such as GICP (Generalized-ICP)[15], 3D-NDT[16]
and AICP(Adaptive ICP)[17], which are widely used in
splicing process of laser point cloud data. When two point
cloud data are similar, the ICP algorithm can realize point
cloud data splicing effectively, but when distinct differences
exist, a good initial value is needed to make ICP iteration
2015 IEEE International Conference on Advanced Intelligent Mechatronics (AIM)
July 7-11, 2015. Busan, Korea
978-1-4673-9107-8/15/$31.00 ©2015 IEEE 555