Data Flow ORB-SLAM for Real-time Performance
on Embedded GPU Boards
Stefano Aldegheri
1
, Nicola Bombieri
1
, Domenico D. Bloisi
2
, and Alessandro Farinelli
1
Abstract— The use of embedded boards on robots, including
unmanned aerial and ground vehicles, is increasing thanks to
the availability of GPU equipped low-cost embedded boards in
the market. Porting algorithms originally designed for desktop
CPUs on those boards is not straightforward due to hardware
limitations. In this paper, we present how we modified and
customized the open source SLAM algorithm ORB-SLAM2 to
run in real-time on the NVIDIA Jetson TX2. We adopted a
data flow paradigm to process the images, obtaining an efficient
CPU/GPU load distribution that results in a processing speed of
about 30 frames per second. Quantitative experimental results
on four different sequences of the KITTI datasets demonstrate
the effectiveness of the proposed approach. The source code of
our data flow ORB-SLAM2 algorithm is publicly available on
GitHub.
I. INTRODUCTION
Navigation is the main task for an autonomous mobile
robot. In order to move from the current position A to a
desired destination B, a mobile robot needs a map, to know
its position on the map, and to have a plan to get from A to
B, possibly selecting the most appropriate from a number of
alternative routes. Simultaneous Localization and Mapping
(SLAM) aims at processing data coming from robot sensors
to build a map of the unknown operational environment and,
at the same time, to localize the sensors in the map (also
getting the trajectories of the moving sensors).
Many different types of sensors can be integrated in
SLAM algorithms such as laser range sensors, encoders,
inertial units, GPS, and cameras. In recent years, SLAM
using cameras only has been actively discussed because
cameras are relatively cheap with respect to other sensor
types and their configuration requires the smallest sensor
setup [1]. When the input for SLAM is visual information
only, the technique is specifically referred to as visual SLAM
(vSLAM).
vSLAM algorithms can be grouped according to three
categories, namely feature-based, direct, and RGB-D ap-
proaches. In feature-based methods, geometric information
from images is estimated by extracting a set of feature
observations from the image in input and by computing
the camera position and scene geometry as a function of
these feature observations only [2]. Direct (or featureless)
approaches aims at optimizing the geometry directly on the
image intensities and use photometric consistency over the
1
Stefano Aldegheri, Nicola Bombieri, and Alessandro Farinelli are with
the Department of Computer Science, University of Verona, Strada le Grazie
15 - 37134 Verona, Italy nicola.bombieri@univr.it
2
Domenico D. Bloisi is with the Department of Mathematics, Computer
Science, and Economics, University of Basilicata, Viale dell’Ateneo Lucano,
10 - 85100 Potenza, Italy domenico.bloisi@unibas.it
Fig. 1. (a) NVIDIA Jetson TX2 module. (b) The KITTI dataset [5]
whole image as an error measurement. In RGB-D SLAM,
dense depth enables the detection of planes that have no
textures [3].
One of the main challenges in vSLAM is to achieve real-
time processing. Direct and RGB-D methods are computa-
tionally demanding and requires GPU computation to run
in real-time. ORB-SLAM2 [4] is, at the moment, the most
complete feature-based vSLAM system [1]. It works in real-
time on standard CPUs, but not on embedded boards.
In this paper, we present a modified version of ORB-
SLAM2 that runs in real-time on an NVIDIA Jetson TX2
embedded board (see Fig. 1). According to [6], ORB-
SLAM2 is the package that provides the best result in
terms of accuracy among the most popular sparse methods.
However, it is a high-demanding algorithm in terms of CPU
and memory usage [7], thus a careful computational load
distribution is required to obtain real-time performance with
hardware limitations.
The contributions of this work are three-fold:
1) We use a data flow paradigm to obtain a representation
of the original algorithm as a graph, which allows to
subdivide efficiently the computational load between
CPU and GPU.
2) Experimental results demonstrate that, by balancing
CPU/GPU usage, it is possible to achieve real-time
performance on four different sequences of the KITTI
dataset while maintaining good accuracy.
3) We provide on GitHub the complete source code
optimized for real-time use on the NVIDIA Jetson
TX2.
2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Macau, China, November 4-8, 2019
978-1-7281-4003-2/19/$31.00 ©2019 IEEE 5370