Visual-Inertial Monocular SLAM with Map Reuse
Ra
´
ul Mur-Artal and Juan D. Tard
´
os
Abstract— In recent years there have been excellent results
in Visual-Inertial Odometry techniques, which aim to compute
the incremental motion of the sensor with high accuracy
and robustness. However these approaches lack the capability
to close loops, and trajectory estimation accumulates drift
even if the sensor is continually revisiting the same place. In
this work we present a novel tightly-coupled Visual-Inertial
Simultaneous Localization and Mapping system that is able to
close loops and reuse its map to achieve zero-drift localization in
already mapped areas. While our approach can be applied to
any camera configuration, we address here the most general
problem of a monocular camera, with its well-known scale
ambiguity. We also propose a novel IMU initialization method,
which computes the scale, the gravity direction, the velocity,
and gyroscope and accelerometer biases, in a few seconds with
high accuracy. We test our system in the 11 sequences of a
recent micro-aerial vehicle public dataset achieving a typical
scale factor error of 1% and centimeter precision. We compare
to the state-of-the-art in visual-inertial odometry in sequences
with revisiting, proving the better accuracy of our method due
to map reuse and no drift accumulation.
Index Terms— SLAM, Sensor Fusion, Visual-Based Naviga-
tion
I. INTRODUCTION
Motion estimation from onboard sensors is currently a
hot topic in Robotics and Computer Vision communities,
as it can enable emerging technologies such as autonomous
cars, augmented and virtual reality, service robots and drone
navigation. Among different sensor modalities, visual-inertial
setups provide a cheap solution with great potential. On the
one hand cameras provide rich information of the environ-
ment, which allows to build 3D models, localize the camera
and recognize already visited places. On the other hand IMU
sensors provide self-motion information, allowing to recover
metric scale for monocular vision, and to estimate gravity
direction, rendering absolute pitch and roll of the sensor.
Visual-inertial fusion has been a very active research
topic in the last years. The recent research is focus on
tightly-coupled (i.e. joint optimization of all sensor states)
visual-inertial odometry, using keyframe-based non-linear
optimization [1]–[4] or filtering [5]–[8]. Nevertheless these
approaches are only able to compute incremental motion
and lack the capability to close loops and reuse a map of
an already mapped environment. This implies that estimated
trajectory accumulates drift without bound, even if the sensor
is always localizing in the same environment. This is due to
This work was supported by the Spanish government under Project
DPI2015-67275, the Arag
´
on regional governmnet under Project DGA T04-
FSE and the Ministerio de Educaci
´
on Scholarship FPU13/04175.
The authors are with the Instituto de Investigaci
´
on en Ingenier
´
ıa de
Arag
´
on (I3A), Universidad de Zaragoza, Mar
´
ıa de Luna 1, 50018 Zaragoza,
Spain. Email: {raulmur,tardos}@unizar.es.
Fig. 1. Top view of the reconstruction built by our system from sequence
V1 02 medium of the EuRoC dataset [11]. This top view was aligned
using the gravity direction computed by Visual-Inertial ORB-SLAM. The
green lines connect keyframes that share more than 100 point observations
and are a proof of the capability of the system to reuse the map. This
reuse capability, in contrast to visual-inertial odometry, allows zero-drift
localization when continually revisiting the same place.
the use of the marginalization of past states to maintain a
constant computational cost [1], [2], [5]–[8], or the use of
full smoothing [3], with an almost constant complexity in
exploration but that can be as expensive as a batch method
in the presence of loop closures [9].
In this paper we present Visual-Inertial ORB-SLAM, to
the best of our knowledge the first keyframe-based Visual-
Inertial SLAM that is able to close loops and reuse its
map. Inspired by [10] our tracking optimizes current frame
assuming a fixed map, and our backend performs local
Bundle Adjustment (BA), optimizing a local window of
keyframes, including an outer window of fixed keyframes
that ensures global consistency. This approach allows for
a constant time local BA, in contrast to full smoothing,
and as not marginalizing past states we are able to reuse
them. We detect large loops using place recognition and a
lightweight pose-graph optimization, followed by full BA in
a separate thread not to interfere with real-time operation.
Fig. 1 shows the reconstruction of our system in a sequence
with continuous revisiting.
Both tracking and local BA work fixing states, which
could potentially bias the solution, therefore we need a
arXiv:1610.05949v1 [cs.RO] 19 Oct 2016