Sonar Visual Inertial SLAM of Underwater Structures
Sharmin Rahman, Alberto Quattrini Li, and Ioannis Rekleitis
Abstract— This paper presents an extension to a state of
the art Visual-Inertial state estimation package (OKVIS) in
order to accommodate data from an underwater acoustic
sensor. Mapping underwater structures is important in several
fields, such as marine archaeology, search and rescue, resource
management, hydrogeology, and speleology. Collecting the data,
however, is a challenging, dangerous, and exhausting task. The
underwater domain presents unique challenges in the quality of
the visual data available; as such, augmenting the exteroceptive
sensing with acoustic range data results in improved reconstruc-
tions of the underwater structures. Experimental results from
underwater wrecks, an underwater cave, and a submerged bus
demonstrate the performance of our approach.
I. INTRODUCTION
This paper presents a real-time simultaneous localization
and mapping (SLAM) algorithm for underwater structures
combining visual data from a stereo camera, angular velocity
and linear acceleration data from an Inertial Measurement
Unit (IMU), and range data from a mechanical scanning
sonar sensor.
Navigating and mapping around underwater structures is
very challenging. Target domains include wrecks (ships,
planes, and buses), underwater structures, such as bridges and
dams, and underwater caves. The primary motivation of this
work is the mapping of underwater caves where exploration
by human divers is an extremely dangerous operation due
to the harsh environment [1]. In addition to underwater
vision constraints—e.g., light and color attenuation—cave
environments suffer from the absence of natural illumination.
Employing robotic technology to map caves would reduce
the cognitive load of divers, who currently take manual
measurements. The majority of underwater sensing for
localization is based on acoustic sensors, such as ultrashort
baseline (USBL) and Doppler Velocity Logger (DVL). How-
ever, such sensors are usually expensive and could possibly
disturb divers and/or the environment. Furthermore, such
sensors do not provide information about the structure of
the environment.
In recent years, many vision-based state estimation al-
gorithms have been developed using monocular, stereo, or
multi-camera system for indoor, outdoor, and underwater
environments. Such algorithms result in cheaper solutions
for state estimation. Vision-based systems can be character-
ized as incremental, when there is no loop closure, termed
Visual Odometry (VO) systems, and full vision-based SLAM
systems [2].
The authors are with the Computer Science and
Engineering Department, University of South Carolina,
Columbia, SC, USA
srahman@email.sc.edu,
[albertoq,yiannisr]@cse.sc.edu
Fig. 1. The custom made sensor suite collecting data for the calibration
of the visual, inertial, and acoustic range data.
Employing most of the available vision-based state esti-
mation packages in the underwater domain is not straight-
forward due to many challenges. In particular, blurriness and
light attenuation result in features that are not as clearly
defined as above water. Consequently, different vision-based
state estimation packages result in a significant number
of outliers or complete tracking loss [3], [4]. In such a
challenging environment, our preliminary work on using
visual data and a video light for mapping an underwater
cave [1] resulted in the successful reconstruction of a 250
meter long cave segment.
Vision can be combined with IMU and other sensors in
the underwater domain for improved estimation of pose [5].
The open source package OKVIS [6] uses vision with IMU
demonstrating superior performance. More recently, ORB-
SLAM has been enriched with IMU [7] to recover scale
for a monocular camera. In this paper, we propose a robust
vision-based state estimation algorithm combining inertial
measurements from IMU, stereo visual data, and range data
from sonar, for underwater structure mapping domains.
Two general approaches have been employed for fusing
inertial data into pure visual odometry. In the first approach,
based on filtering, IMU measurements are used for state
propagation while visual features are used for the update
phase. The second approach, relying on nonlinear optimiza-
tion, jointly optimizes all sensor states by minimizing both
the IMU error term and the landmark reprojection error.
Recent nonlinear optimization based Visual-Inertial Odom-
etry (VIO) algorithms [6], [7] showed better accuracy over
filtering approaches with comparable computational cost.
In this paper, a tightly-coupled nonlinear optimization
method is employed to integrate IMU measurements with
stereo vision and sonar data; see Fig. 1 for the underwater
sensor suite used during calibration of both camera intrin-
sics and extrinsics, required for good performance of VIO
approaches.
2018 IEEE International Conference on Robotics and Automation (ICRA)
May 21-25, 2018, Brisbane, Australia
978-1-5386-3081-5/18/$31.00 ©2018 IEEE 5190