IEEE TRANSACTIONS ON ROBOTICS 4
closing. The tracking is in charge of localizing the camera
with every frame and deciding when to insert a new keyframe.
We perform first an initial feature matching with the previous
frame and optimize the pose using motion-only BA. If the
tracking is lost (e.g. due to occlusions or abrupt movements),
the place recognition module is used to perform a global
relocalization. Once there is an initial estimation of the camera
pose and feature matchings, a local visible map is retrieved
using the covisibility graph of keyframes that is maintained
by the system, see Fig. 2(a) and Fig. 2(b). Then matches with
the local map points are searched by reprojection, and camera
pose is optimized again with all matches. Finally the tracking
thread decides if a new keyframe is inserted. All the tracking
steps are explained in detail in Section V. The novel procedure
to create an initial map is presented in Section IV.
The local mapping processes new keyframes and performs
local BA to achieve an optimal reconstruction in the sur-
roundings of the camera pose. New correspondences for un-
matched ORB in the new keyframe are searched in connected
keyframes in the covisibility graph to triangulate new points.
Some time after creation, based on the information gathered
during the tracking, an exigent point culling policy is applied
in order to retain only high quality points. The local mapping
is also in charge of culling redundant keyframes. We explain
in detail all local mapping steps in Section VI.
The loop closing searches for loops with every new
keyframe. If a loop is detected, we compute a similarity trans-
formation that informs about the drift accumulated in the loop.
Then both sides of the loop are aligned and duplicated points
are fused. Finally a pose graph optimization over similarity
constraints [6] is performed to achieve global consistency. The
main novelty is that we perform the optimization over the
Essential Graph, a sparser subgraph of the covisibility graph
which is explained in Section III-D. The loop detection and
correction steps are explained in detail in Section VII.
We use the Levenberg-Marquardt algorithm implemented in
g2o [37] to carry out all optimizations. In the Appendix we
describe the error terms, cost functions, and variables involved
in each optimization.
C. Map Points, KeyFrames and their Selection
Each map point p
i
stores:
• Its 3D position X
w,i
in the world coordinate system.
• The viewing direction n
i
, which is the mean unit vector
of all its viewing directions (the rays that join the point
with the optical center of the keyframes that observe it).
• A representative ORB descriptor D
i
, which is the as-
sociated ORB descriptor whose hamming distance is
minimum with respect to all other associated descriptors
in the keyframes in which the point is observed.
• The maximum d
max
and minimum d
min
distances at
which the point can be observed, according to the scale
invariance limits of the ORB features.
Each keyframe K
i
stores:
• The camera pose T
iw
, which is a rigid body transforma-
tion that transforms points from the world to the camera
coordinate system.
(a) KeyFrames (blue), Current Cam-
era (green), MapPoints (black, red),
Current Local MapPoints (red)
(b) Covisibility Graph
(c) Spanning Tree (green) and Loop
Closure (red)
(d) Essential Graph
Fig. 2. Reconstruction and graphs in the sequence fr3 long office household
from the TUM RGB-D Benchmark [38].
• The camera intrinsics, including focal length and princi-
pal point.
• All the ORB features extracted in the frame, associated
or not to a map point, whose coordinates are undistorted
if a distortion model is provided.
Map points and keyframes are created with a generous pol-
icy, while a later very exigent culling mechanism is in charge
of detecting redundant keyframes and wrongly matched or not
trackable map points. This permits a flexible map expansion
during exploration, which boost tracking robustness under hard
conditions (e.g. rotations, fast movements), while its size is
bounded in continual revisits to the same environment, i.e.
lifelong operation. Additionally our maps contain very few
outliers compared with PTAM, at the expense of containing
less points. Culling procedures of map points and keyframes
are explained in Sections VI-B and VI-E respectively.
D. Covisibility Graph and Essential Graph
Covisibility information between keyframes is very useful in
several tasks of our system, and is represented as an undirected
weighted graph as in [7]. Each node is a keyframe and an edge
between two keyframes exists if they share observations of the
same map points (at least 15), being the weight θ of the edge
the number of common map points.
In order to correct a loop we perform a pose graph opti-
mization [6] that distributes the loop closing error along the
graph. In order not to include all the edges provided by the
covisibility graph, which can be very dense, we propose to
build an Essential Graph that retains all the nodes (keyframes),
but less edges, still preserving a strong network that yields