
A General Optimization-based Framework for Local Odometry
Estimation with Multiple Sensors
Tong Qin, Jie Pan, Shaozu Cao, and Shaojie Shen
Abstract— Nowadays, more and more sensors are equipped
on robots to increase robustness and autonomous ability. We
have seen various sensor suites equipped on different platforms,
such as stereo cameras on ground vehicles, a monocular camera
with an IMU (Inertial Measurement Unit) on mobile phones,
and stereo cameras with an IMU on aerial robots. Although
many algorithms for state estimation have been proposed in the
past, they are usually applied to a single sensor or a specific
sensor suite. Few of them can be employed with multiple sensor
choices. In this paper, we proposed a general optimization-based
framework for odometry estimation, which supports multiple
sensor sets. Every sensor is treated as a general factor in our
framework. Factors which share common state variables are
summed together to build the optimization problem. We further
demonstrate the generality with visual and inertial sensors,
which form three sensor suites (stereo cameras, a monocular
camera with an IMU, and stereo cameras with an IMU). We
validate the performance of our system on public datasets and
through real-world experiments with multiple sensors. Results
are compared against other state-of-the-art algorithms. We
highlight that our system is a general framework, which can
easily fuse various sensors in a pose graph optimization. Our
implementations are open source
1
.
I. INTRODUCTION
Real-time 6-DoF (Degrees of Freedom) state estimation
is a fundamental technology for robotics. Accurate state
estimation plays an important role in various intelligent
applications, such as robot exploration, autonomous driving,
VR (Virtual Reality) and AR (Augmented Reality). The most
common sensors we use in these applications are cameras. A
large number of impressive vision-based algorithms for pose
estimation has been proposed over the last decades, such as
[1]–[5]. Besides cameras, the IMU is another popular option
for state estimation. The IMU can measure acceleration and
angular velocity at a high frequency, which is necessary for
low-latency pose feedback in real-time applications. Hence,
there are numerous research works fusing vision and IMU
together, such as [6]–[12]. Another popular sensor used in
state estimation is LiDAR. LiDAR-based approaches [13]
achieve accurate pose estimation in a confined local envi-
ronment. Although a lot of algorithms have been proposed
in the past, they are usually applied to a single input sensor
or a specific sensor suite.
Recently, we have seen platforms equipped with various
sensor sets, such as stereo cameras on ground vehicles, a
monocular camera with an IMU on mobile phones, stereo
All authors are with the Department of Electronic and
Computer Engineering, Hong Kong University of Science and
Technology, Hong Kong, China. {tong.qin, jie.pan,
shaozu.cao}@connect.ust.hk, eeshaojie@ust.hk.
1
https://github.com/HKUST-Aerial-Robotics/VINS-Fusion
Fig. 1. An illustration of the proposed framework for state estimation,
which supports multiple sensor choices, such as stereo cameras, a monocular
camera with an IMU, and stereo cameras with an IMU. Each sensor is
treated as a general factor. Factors which share common state variables are
summed together to build the optimization problem.
cameras with an IMU on aerial robots. However, as most
traditional algorithms were designed for a single sensor or
a specific sensor set, they cannot be ported to different
platforms. Even for one platform, we need to choose dif-
ferent sensor combinations in different scenarios. Therefore,
a general algorithm which supports different sensor suites
is required. Another practical requirement is that in case
of sensor failure, an inactive sensor should be removed
and an alternative sensor should be added into the system
quickly. Hence, a general algorithm which is compatible with
multiple sensors is in need.
In this paper, we propose a general optimization-based
framework for pose estimation, which supports multiple
sensor combinations. We further demonstrate it with visual
and inertial sensors, which form three sensor suites (stereo
cameras, a monocular camera with an IMU, and stereo cam-
eras with an IMU). We can easily switch between different
sensor combinations. We highlight the contribution of this
paper as follows:
• a general optimization-based framework for state esti-
mation, which supports multiple sensors.
• a detailed demonstration of state estimation with visual
and inertial sensors, which form different sensor suites
(stereo cameras, a monocular camera + an IMU, and
stereo cameras + an IMU).
• an evaluation of the proposed system on both public
datasets and real experiments.
arXiv:1901.03638v1 [cs.CV] 11 Jan 2019