DSOL: A Fast Direct Sparse Odometry Scheme
Chao Qu, Shreyas S. Shivakumar, Ian D. Miller and Camillo J. Taylor
Abstract— In this paper, we describe Direct Sparse Odometry
Lite (DSOL), an improved version of Direct Sparse Odometry
(DSO) [1]. We propose several algorithmic and implementation
enhancements which speed up computation by a significant
factor (on average 5x) even on resource constrained plat-
forms. The increase in speed allows us to process images at
higher frame rates, which in turn provides better results on
rapid motions. Our open-source implementation is available at
https://github.com/versatran01/dsol.
I. INTRODUCTION
Localization and mapping are key components of many
robotic systems. In this work we are motivated by the re-
quirements of micro aerial vehicles where payload and power
limit the choice of sensors and the computational capacity
of the system. For these platforms, vision is an attractive
sensing modality since cameras are relatively inexpensive in
terms of mass and power consumption.
We choose to build our work on DSO and Stereo-DSO
(SDSO) as described in [1], [2]. The direct approach to
visual odometry minimizes the photometric error instead
of a geometric one. In their work Engel et al. provide
extensive experiments which show that the direct and sparse
combination offers unique advantages over state-of-the-art
feature-based methods like ORB-SLAM [3] and semi-dense
direct methods such as LSD-SLAM [4] in terms of accuracy
and speed.
The computational advantage is particularly important in
the aerial context that motivates this work, where agile robots
often travel up to 10m/s [5]. We show that the proposed
approach can run at rates of up to 500Hz on common
computational hardware. For direct methods, the improve-
ment in runtime also provides concomitant improvements in
accuracy and robustness [6]: faster processing times allows
for higher frame rates which can then estimate moderate
motions with greater fidelity and aggressive motions with
higher robustness.
For many robotics application, DSO is not suitable since
it is a monocular method which does not have a consis-
tent, metric scale and has delayed initialization and re-
initialization. This poses great safety concerns, especially for
flying robots. For this reason, we design our system to be
directly initializable from stereo and/or depth images. In this
We acknowledge the support of Distributed and Collaborative Intelligent
Systems and Technology Collaborative Research Alliance. Ian Miller ac-
knowledges the support of a NASA Space Technology Research Fellowship.
C. Qu, S. S. Shivakumar, I. Miller and C. J. Taylor are with the GRASP
Laboratory, School of Engineering and Applied Sciences, University
of Pennsylvania {quchao, sshreyas, iandm, cjtaylor}
@seas.upenn.edu
Fig. 1: Results of DSOL on selected sequences from the Tar-
tanAir dataset [7]. Green lines are groundtruth trajectories,
red lines are estimated trajectories.
work, we focus on the stereo version and evaluate our system
against SDSO, which is itself an improvement upon DSO.
The remaining sections describe the proposed system in
more details, highlighting differences compared to DSO and
SDSO, and explaining relevant implementation choices that
lead to improvements. We summarize our changes as follow:
(1) we adopt the inverse compositional alignment approach
in frame tracking, (2) we track the new image w.r.t the entire
window instead of the last keyframe, (3) we propose a better
stereo photometric bundle adjustment formulation compared
to SDSO, (4) we greatly simplify the keyframe creation and
removal criteria from DSO, and (5) we parallelize our system
to effectively utilize all available computational resources.
Together, these changes lead to a simple and lightning-fast
direct sparse odometry system, which we call Direct Sparse
Odometry Lite (DSOL).
II. RELATED WORK
Visual odometry (VO) algorithms can be broadly catego-
rized along the following two axes: direct vs. indirect and
dense vs. sparse. Direct methods recover model parame-
ters directly from images by minimizing photometric error
based on the brightness constancy assumption [8]. This is
in stark contrast to indirect methods, often called feature-
based methods, where correspondences are first established
based on some intermediate representations, and the model
parameters are optimized by minimizing reprojection errors.
Dense methods aim to use all information from the image
for better accuracy and robustness at the cost of increased
computation. Sparse methods, on the other hand, recognize
that image data is highly redundant and choose to only
process a selected yet informative subset of the image.
Early VO/V-SLAM systems were mostly sparse and indi-
rect [9], [10], [11], [12]. This was partly due to the com-
putation limit at the time, but also dictated by the needs of
arXiv:2203.08182v1 [cs.RO] 15 Mar 2022