Int J Comput Vis (2017) 121:234–252 237
scene points in the camera reference frame to the image plane.
In the classical case π
c
is expressed in terms of the calibration
matrix K
c
:
π
c
= K
c
=
⎡
⎣
ccso
u
0 c(1 + m) o
v
00 1
⎤
⎦
=
⎡
⎣
c
u
c
u
so
u
0 c
v
o
v
001
⎤
⎦
(3)
where c is the focal length, (1 + m) is a s cale factor for the
v axis, o
u
, o
v
are the coordinates of the principal point and
s models the skewness of the u and v axis. The rigid homo-
geneous transformation from camera t o reference frame is:
M
t
=
R
t
Z
0t
0 1
, R
t
=
⎡
⎣
r
11
r
12
r
13
r
21
r
22
r
23
r
31
r
32
r
33
⎤
⎦
, Z
0t
=
⎡
⎣
X
0t
Y
0t
Z
0t
⎤
⎦
(4)
with Z
0t
being the projection center in world frame coordi-
nates, and R
t
the rotation matrix of world to camera system.
Substituting the above transformation and projection matrix
into Eq. 2 the classical collinearity equations become:
λ
it
m
it
= K
c
R
T
t
[I|−Z
0t
]X
i
(5)
To express the observations in the image plane as a function
of all unknowns, we divide Eq. 5 by the third row yielding:
u = c
u
·
r
11
(X − X
0
) + r
21
(Y − Y
0
) + r
31
(Z − Z
0
)
r
13
(X − X
0
) + r
23
(Y − Y
0
) + r
33
(Z − Z
0
)
+ o
u
+ d
r
(ρ) + d
t
u
(ρ)
v = c
v
·
r
12
(X − X
0
) + r
22
(Y − Y
0
) + r
32
(Z − Z
0
)
r
13
(X − X
0
) + r
23
(Y − Y
0
) + r
33
(Z − Z
0
)
+ o
v
+ d
r
(ρ) + d
t
v
(ρ) (6)
with ρ =
√
˜u
2
+˜v
2
being the radial distance from the ori-
gin of the sensor coordinate system, and
˜
m =[˜u, ˜v]
T
=
[u − o
u
,v − o
v
]
T
. Note that we now added radial and tan-
gential distortion functions d
r
and d
t
respectively and omitted
the skew factor s as well as the indices i and t to facilitate the
reading. The most common characterization for both distor-
tion effects can be found in Brown (1966):
d
r
(ρ) = R
1
ρ
2
+ R
2
ρ
4
+ R
3
ρ
6
d
t
u
(ρ) = 2T
1
uv + T
2
(ρ
2
+ 2u
2
)
d
t
v
(ρ) = T
1
(ρ
2
+ 2v
2
) + 2T
2
uv
(7)
where R
1
, R
2
, R
3
are the coefficients of the radial-symmetric
distortion and T
1
, T
2
model the tangential-asymmetric distor-
tion. Analyzing Eq. 6 we can identify the following possible
challenges for the use in multi-camera systems:
1. Point with Incident Angles Greater 90
◦
We can not dis-
tinguish, and hence project, points that lie behind the
cameras, given that a perspective camera model is used.
2. Camera Model To overcome the first issue, we could use
an omnidirectional camera model and include it in Eq. 6.
If multiple cameras, e.g. fisheye and perspective cam-
eras would be combined to a camera system, we have to
provide different versions of Eq. 6. This challenge could
either be eluded by transforming image coordinates to
bearing vectors (see Sect. 3.4) using the corresponding
intrinsics of each camera or by using a general cam-
era model that models all prevalent cameras. This work
utilizes the latter approach, but see Sect. 3.4 for a com-
parison to Schneider et al. (2012) and Kneip et al. (2013)
who use bearing vectors as observations and a reasoning
why new challenges arise when optimizations are carried
out over camera rays instead of image coordinates.
3. Multiple Cameras Finally, Eq. 6 expresses s olely the pro-
jection of scene points i to a camera at time t.InaMCS
the projection has to be expanded by a transformation of
scene point X
i
to a MCS pose at time t and finally to a
camera c within some MCS coordinate system.
In the following, the general camera model is introduced. We
subsequently show how the classical collinearity equation
is expanded with it and how the transformation from the
MCS coordinate system to each camera coordinate system is
modeled.
3.2 Camera Model
In order to be able to utilize arbitrary cameras in the MCS,
a suitable camera model is necessary. We chose to include
the camera model proposed in Scaramuzza et al. (2006a,b),
since it is not limited to specific cases and allows us to employ
all prevalent cameras, that are currently used in applied com-
puter vision and robotics, e.g., perspective, dioptric (fisheye),
as well as catadioptric cameras. This section provides a brief
compilation of the model as well as a comparison to perspec-
tive cameras. This is supposed to emphasize the differences
and show how the classical perspective model is generalized.
Again, given a point m =[u,v]
T
on the image plane,
the corresponding point on the sensor plane is
˜
m. Those two
points, are related by an affine transformation:
m = A
˜
m + O
c
(8)
where matrix A =[a
11
, a
12
;a
21
, 1]
T
accounts for small mis-
alignments between sensor and lens axis and the digitization
process (see Scaramuzza et al. 2006a). The principal point
O
c
=[o
u
, o
v
]
T
relates all coordinates to the center of distor-
tion. Now let X
c
=[X
c
, Y
c
, Z
c
]
T
be a scene point already
transformed into the camera frame. Then the following for-
123