ORB-SLAM2: an Open-Source SLAM System for
Monocular, Stereo and RGB-D Cameras
Ra
´
ul Mur-Artal and Juan D. Tard
´
os
Abstract— We present ORB-SLAM2 a complete SLAM sys-
tem for monocular, stereo and RGB-D cameras, including
map reuse, loop closing and relocalization capabilities. The
system works in real-time in standard CPUs in a wide variety
of environments from small hand-held indoors sequences, to
drones flying in industrial environments and cars driving
around a city. Our backend based on Bundle Adjustment
with monocular and stereo observations allows for accurate
trajectory estimation with metric scale. Our system includes
a lightweight localization mode that leverages visual odometry
tracks for unmapped regions and matches to map points that
allow for zero-drift localization. The evaluation in 29 popular
public sequences shows that our method achieves state-of-the-
art accuracy, being in most cases the most accurate SLAM
solution. We publish the source code, not only for the benefit
of the SLAM community, but with the aim of being an out-of-
the-box SLAM solution for researchers in other fields.
I. INTRODUCTION
Simultaneous Localization and Mapping (SLAM) has been
a hot research topic in the last two decades in the Computer
Vision and Robotics communities, and has recently attracted
the attention of high-technological companies. SLAM tech-
niques build a map of an unknown environment and localize
the sensor in the map with a strong focus on real-time
operation. Among the different sensor modalities, cameras
are cheap and provide rich information of the environment
that allows for robust and accurate place recognition. Place
recognition is a key module of a SLAM system to close
loops (i.e. detect when the sensor returns to a mapped area
and correct the accumulated error in exploration) and to
relocalize the camera after a tracking failure, due to occlusion
or aggressive motion, or at system re-initialization. Therefore
Visual SLAM, where the main sensor is a camera, has been
strongly developed in the last years.
Visual SLAM can be performed by using just a monocular
camera, which is the cheapest and smallest sensor setup.
However as depth is not observable from just one camera,
the scale of the map and estimated trajectory is unknown.
In addition the system bootstrapping require multi-view or
filtering techniques to produce an initial map as it cannot
be triangulated from the very first frame. Last but not least,
monocular SLAM suffers from scale drift and may fail if
performing pure rotations in exploration. By using a stereo
or an RGB-D camera all these issues are solved and allows
for the most reliable Visual SLAM solutions.
In this paper we built on our monocular ORB-SLAM [1]
and propose ORB-SLAM2 with the following contributions:
(a) Stereo input: trajectory and sparse reconstruction of an urban environ-
ment with multiple loop closures.
(b) RGB-D input: keyframes and dense pointcloud of a room scene with
one loop closure. The pointcloud is rendered by backprojecting the sensor
depth maps from estimated keyframe poses. No fusion is performed.
Fig. 1. ORB-SLAM2 processes stereo and RGB-D inputs to estimate
camera trajectory and build a map of the environment. The system is able
to close loops, relocalize, and reuse its map in real-time in standard CPUs
with high accuracy and robustness.
• The first open-source
1
SLAM system for monocular,
stereo and RGB-D cameras, including loop closing,
relocalization and map reuse.
• Our RGB-D results shows that by using Bundle Ad-
justment (BA) we achieve more accuracy than state-of-
the-art methods based on ICP or photometric and depth
error minimization.
arXiv:1610.06475v1 [cs.RO] 20 Oct 2016