4
q to represent rotation. We primarily use quaternions in state
vectors, but rotation matrices are also used for convenience
rotation of 3D vectors. q
w
b
, p
w
b
are rotation and translation
from the body frame to the world frame. b
k
is the body
frame while taking the k
th
image. c
k
is the camera frame
while taking the k
th
image. ⊗ represents the multiplication
operation between two quaternions. g
w
= [0, 0, g]
T
is the
gravity vector in the world frame. Finally, we denote
ˆ
(·) as
the noisy measurement or estimate of a certain quantity.
IV. MEASUREMENT PREPROCESSING
This section presents preprocessing steps for both inertial
and monocular visual measurements. For visual measurements,
we track features between consecutive frames and detect new
features in the latest frame. For IMU measurements, we pre-
integrate them between two consecutive frames. Note that the
measurements of the low-cost IMU that we use are affected
by both bias and noise. We therefore especially take bias into
account in the IMU pre-integration process.
A. Vision Processing Front-end
For each new image, existing features are tracked by the
KLT sparse optical flow algorithm [29]. Meanwhile, new cor-
ner features are detected [30] to maintain a minimum number
(100-300) of features in each image. The detector enforces a
uniform feature distribution by setting a minimum separation
of pixels between two neighboring features. 2D Features are
firstly undistorted and then projected to a unit sphere after
passing outlier rejection. Outlier rejection is performed using
RANSAC with fundamental matrix model [31].
Keyframes are also selected in this step. We have two crite-
ria for keyframe selection. The first one is the average parallax
apart from the previous keyframe. If the average parallax of
tracked features is between the current frame and the latest
keyframe is beyond a certain threshold, we treat frame as a
new keyframe. Note that not only translation but also rotation
can cause parallax. However, features cannot be triangulated in
the rotation-only motion. To avoid this situation, we use short-
term integration of gyroscope measurements to compensate
rotation when calculating parallax. Note that this rotation
compensation is only used to keyframe selection, and is not
involved in rotation calculation in the VINS formulation. To
this end, even if the gyroscope contains large noise or is biased,
it will only result in suboptimal keyframe selection results,
and will not directly affect the estimation quality. Another
criterion is tracking quality. If the number of tracked features
goes below a certain threshold, we treat this frame as a new
keyframe. This criterion is to avoid complete loss of feature
tracks.
B. IMU Pre-integration
IMU Pre-integration was first proposed in [22], which
parametrized rotation error in Euler angle. An on-manifold
rotation formulation for IMU pre-integration was developed
in our previous work [7]. This work derived the covariance
propagation using continuous-time IMU error state dynamics.
However, IMU biases were ignored. The pre-integration theory
was further improved in [23] by adding posterior IMU bias
correction. In this paper, we extend the IMU pre-integration
proposed in our previous work [7] by incorporating IMU bias
correction.
The raw gyroscope and accelerometer measurements from
IMU,
ˆ
ω and
ˆ
a, are given by:
ˆ
a
t
= a
t
+ b
a
t
+ R
t
w
g
w
+ n
a
ˆ
ω
t
= ω
t
+ b
w
t
+ n
w
.
(1)
IMU measurements, which are measured in the body frame,
combines the force for countering gravity and the platform
dynamics, and are affected by acceleration bias b
a
, gyroscope
bias b
w
, and additive noise. We assume that the additive noise
in acceleration and gyroscope measurements are Gaussian,
n
a
∼ N(0, σ
2
a
), n
w
∼ N(0, σ
2
w
). Acceleration bias and
gyroscope bias are modeled as random walk, whose derivatives
are Gaussian, n
b
a
∼ N(0, σ
2
b
a
), n
b
w
∼ N(0, σ
2
b
w
):
˙
b
a
t
= n
b
a
,
˙
b
w
t
= n
b
w
. (2)
Given two time instants that correspond to image frames
b
k
and b
k+1
, position, velocity, and orientation states can
be propagated by inertial measurements during time interval
[t
k
, t
k+1
] in the world frame:
p
w
b
k+1
= p
w
b
k
+ v
w
b
k
∆t
k
+
ZZ
t∈[t
k
,t
k+1
]
(R
w
t
(
ˆ
a
t
− b
a
t
− n
a
) − g
w
) dt
2
v
w
b
k+1
= v
w
b
k
+
Z
t∈[t
k
,t
k+1
]
(R
w
t
(
ˆ
a
t
− b
a
t
− n
a
) − g
w
) dt
q
w
b
k+1
= q
w
b
k
⊗
Z
t∈[t
k
,t
k+1
]
1
2
Ω(
ˆ
ω
t
− b
w
t
− n
w
)q
b
k
t
dt,
(3)
where
Ω(ω) =
−bωc
×
ω
−ω
T
0
, bωc
×
=
0 −ω
z
ω
y
ω
z
0 −ω
x
−ω
y
ω
x
0
.
(4)
∆t
k
is the duration between the time interval [t
k
, t
k+1
].
It can be seen that the IMU state propagation requires
rotation, position and velocity of frame b
k
. When these starting
states change, we need to re-propagate IMU measurements.
Especially in the optimization-based algorithm, every time we
adjust poses, we will need to re-propagate IMU measurements
between them. This propagation strategy is computationally
demanding. To avoid re-propagation, we adopt pre-integration
algorithm.
After change the reference frame from the world frame to
the local frame b
k
, we can only pre-integrate the parts which
are related to linear acceleration
ˆ
a and angular velocity
ˆ
ω as
follows:
R
b
k
w
p
w
b
k+1
= R
b
k
w
(p
w
b
k
+ v
w
b
k
∆t
k
−
1
2
g
w
∆t
2
k
) + α
b
k
b
k+1
R
b
k
w
v
w
b
k+1
= R
b
k
w
(v
w
b
k
− g
w
∆t
k
) + β
b
k
b
k+1
q
b
k
w
⊗ q
w
b
k+1
= γ
b
k
b
k+1
,
(5)