1150 IEEE TRANSACTIONS ON ROBOTICS, VOL. 31, NO. 5, OCTOBER 2015
Fig. 1. ORB-SLAM system overview, showing all the steps performed by the
tracking, local mapping, and loop closing threads. The main components of the
place recognition module and the map are also shown.
searched by reprojection, and camera pose is optimized again
with all matches. Finally, the tracking thread decides if a new
keyframe is inserted. All the tracking steps are explained in de-
tail in Section V. The novel procedure to create an initial map
is presented in Section IV.
The local mapping processes new keyframes and performs lo-
cal BA to achieve an optimal reconstruction in the surroundings
of the camera pose. New correspondences for unmatched ORB
in the new keyframe are searched in connected keyframes in
the covisibility graph to triangulate new points. Some time after
creation, based on the information gathered during the track-
ing, an exigent point culling policy is applied in order to retain
only high quality points. The local mapping is also in charge
of culling redundant keyframes. We explain in detail all local
mapping steps in Section VI.
The loop closing searches for loops with every new keyframe.
If a loop is detected, we compute a similarity transformation
that informs about the drift accumulated in the loop. Then, both
sides of the loop are aligned and duplicated points are fused.
Finally, a pose graph optimization over similarity constraints [6]
is performed to achieve global consistency. The main novelty is
that we perform the optimization over the Essential Graph, i.e.,
a sparser subgraph of the covisibility graph which is explained
in Section III-D. The loop detection and correction steps are
explained in detail in Section VII.
We use the Levenberg–Marquardt algorithm implemented in
g2o [37] to carry out all optimizations. In the Appendix, we
describe the error terms, cost functions, and variables involved
in each optimization.
C. Map Points, Keyframes, and Their Selection
Each map point p
i
stores the following:
1) its 3-D position X
w,i
in the world coordinate system;
2) the viewing direction n
i
, which is the mean unit vec-
tor of all its viewing directions (the rays that join
Fig. 2. Reconstruction and graphs in the sequence fr3 long
office household from the TUM RGB-D Benchmark [38]. (a) Keyframes
(blue), current camera (green), map points (black, red), current local map points
(red). (b) Covisibility graph. (c) Spanning tree (green) and loop closure (red).
(d) Essential graph.
the point with the optical center of the keyframes that
observe it);
3) a representative ORB descriptor D
i
, which is the asso-
ciated ORB descriptor whose hamming distance is mini-
mum with respect to all other associated descriptors in the
keyframes in which the point is observed;
4) the maximum d
max
and minimum d
min
distances at which
the point can be observed, according to the scale invari-
ance limits of the ORB features.
Each keyframe K
i
stores the following:
1) the camera pose T
iw
, which is a rigid body transforma-
tion that transforms points from the world to the camera
coordinate system;
2) the camera intrinsics, including focal length and principal
point;
3) all the ORB features extracted in the frame, associated or
not with a map point, whose coordinates are undistorted
if a distortion model is provided.
Map points and keyframes are created with a generous policy,
while a later very exigent culling mechanism is in charge of
detecting redundant keyframes and wrongly matched or not
trackable map points. This permits a flexible map expansion
during exploration, which boost tracking robustness under hard
conditions (e.g., rotations, fast movements), while its size is
bounded in continual revisits to the same environment, i.e.,
lifelong operation. Additionally, our maps contain very few
outliers compared with PTAM, at the expense of containing
less points. Culling procedures of map points and keyframes
are explained in Sections VI-B and VI-E, respectively.