Sensors 2018, 18, 2118 3 of 12
2.
We extend the state-of-the-art ElasticFusion [
20
] to a multi-camera system to get a better dense
RGB-D SLAM.
Sensors 2018, 18, x 3 of 11
Figure 1. Example of three-Kinect arrangement.
2. Extrinsic Calibration of Multiple Cameras
2.1. Odometer-Based Extrinsic Calibration
We run RGB-D visual odometry (VO) for each camera in a feature-rich scene to estimate a set of
camera poses which is required for the subsequent step of hand–eye calibration. Our RGB-D VO
method is similar to [21], which is the classical VO method for RGB-D SLAM. We perform a dense
iterated close point (ICP) method to estimate the camera pose, using a projective data association
algorithm [22] to obtain correspondence and a point-to-plane error metric for pose optimization.
Then we solve the optimization problem based on the GPU’s parallelized processing pipeline. The
point-to-plane error energy for the desired camera pose estimate T is
E=
(
()−
()
)
∙
.
∈
(1)
We track the current camera frame by aligning a live surface measurement (
,
) against the
model prediction from the previous frame (
,
), where Ω⊂ℕ
is the image space domain, v
is vertex, n is normal, and k is the timestamp. With the VO method, we obtain a set of camera poses.
Then we use the hand–eye calibration method of [7] to estimate each camera-odometry
transformation. The unknown camera-odometry transformation is estimated in two steps. In the first
step, the rotation cost function is minimized to estimate the pitch and roll angles of the camera-
odometry transformation. In the second step, the translation cost function is minimized to estimate
the yaw angle and the camera-odometry translation. The relationship between camera and robot can
be expressed as a rotation formula and a translation formula as
=
,
(2)
−
=
(
)
−
.
(3)
In the above, the rotation is represented by quaternion, and the translation by a vector. The
robot’s transformation between time i and time i + 1 is denoted by the vector
and the unit
quaternion
, which can be obtained from the robot’s inertial measurement unit.
and
represent the camera’s transformation between time i and time i + 1 which can be obtained by the
above VO method.
and
represent the transformation between the robot and the camera. In
the first step, we decompose the unknown unit quaternion
into three unit quaternions,
corresponding to Z–X–Y. Euler angles α, β, γ as
=
(
)
(
,
)
.
(4)
Since both
and
() represent rotations around the z axis, they satisfy commutative law.
After simplifying Function (2), the rotation residual term becomes
Figure 1. Example of three-Kinect arrangement.
2. Extrinsic Calibration of Multiple Cameras
2.1. Odometer-Based Extrinsic Calibration
We run RGB-D visual odometry (VO) for each camera in a feature-rich scene to estimate a set
of camera poses which is required for the subsequent step of hand–eye calibration. Our RGB-D
VO method is similar to [
21
], which is the classical VO method for RGB-D SLAM. We perform
a dense iterated close point (ICP) method to estimate the camera pose, using a projective data
association algorithm [
22
] to obtain correspondence and a point-to-plane error metric for pose
optimization. Then we solve the optimization problem based on the GPU’s parallelized processing
pipeline. The point-to-plane error energy for the desired camera pose estimate T is
E =
∑
u∈Ω
((
Tv
k
(
u
)
− v
k−1
(
u
))
·n
k−1
)
2
. (1)
We track the current camera frame by aligning a live surface measurement (
v
k
,
n
k
) against the
model prediction from the previous frame (
v
k−1
,
n
k−1
), where
Ω ⊂ N
2
is the image space domain,
v
is
vertex, n is normal, and k is the timestamp. With the VO method, we obtain a set of camera poses.
Then we use the hand–eye calibration method of [
7
] to estimate each camera-odometry
transformation. The unknown camera-odometry transformation is estimated in two steps. In the
first step, the rotation cost function is minimized to estimate the pitch and roll angles of the
camera-odometry transformation. In the second step, the translation cost function is minimized
to estimate the yaw angle and the camera-odometry translation. The relationship between camera and
robot can be expressed as a rotation formula and a translation formula as
R
i+1
R
i
q
R
C
q =
R
C
q
C
i+1
C
i
q, (2)
R
R
i+1
R
i
q
− I
R
C
p = R
R
C
q
C
i+1
C
i
p −
R
i+1
R
i
p. (3)
In the above, the rotation is represented by quaternion, and the translation by a vector. The robot’s
transformation between time i and time i + 1 is denoted by the vector
R
i+1
R
i
p
and the unit quaternion
R
i+1
R
i
q
, which can be obtained from the robot’s inertial measurement unit.
C
i+1
C
i
p
and
C
i+1
C
i
q
represent
the camera’s transformation between time i and time i + 1 which can be obtained by the above VO
method.
R
C
p
and
R
C
q
represent the transformation between the robot and the camera. In the first step,