4
Maximum a posteriori (MAP) estimation and the SLAM
back-end. The current de-facto standard formulation of SLAM
has its origins in the seminal paper of Lu and Milios [161],
followed by the work of Gutmann and Konolige [101].
Since then, numerous approaches have improved the efficiency
and robustness of the optimization underlying the problem
[63, 81, 100, 125, 192, 241]. All these approaches formulate
SLAM as a maximum a posteriori estimation problem, and
often use the formalism of factor graphs [143] to reason about
the interdependence among variables.
Assume that we want to estimate an unknown variable X ; as
mentioned before, in SLAM the variable X typically includes
the trajectory of the robot (as a discrete set of poses) and
the position of landmarks in the environment. We are given
a set of measurements Z = {z
k
: k = 1, . . . , m} such that
each measurement can be expressed as a function of X , i.e.,
z
k
= h
k
(X
k
)+
k
, where X
k
⊆ X is a subset of the variables,
h
k
(·) is a known function (the measurement or observation
model), and
k
is random measurement noise.
In MAP estimation, we estimate X by computing the
assignment of variables X
?
that attains the maximum of the
posterior p(X |Z) (the belief over X given the measurements):
X
?
.
= argmax
X
p(X |Z) = argmax
X
p(Z|X )p(X ) (1)
where the equality follows from the Bayes theorem. In (1),
p(Z|X ) is the likelihood of the measurements Z given the
assignment X , and p(X ) is a prior probability over X . The
prior probability includes any prior knowledge about X ; in
case no prior knowledge is available, p(X ) becomes a constant
(uniform distribution) which is inconsequential and can be
dropped from the optimization. In that case MAP estimation
reduces to maximum likelihood estimation. Note that, unlike
Kalman filtering, MAP estimation does not require an explicit
distinction between motion and observation model: both mod-
els are treated as factors and are seamlessly incorporated in the
estimation process. Moreover, it is worth noting that Kalman
filtering and MAP estimation return the same estimate in the
linear Gaussian case, while this is not the case in general.
Assuming that the measurements Z are independent (i.e.,
the corresponding noises are uncorrelated), problem (1) fac-
torizes into:
X
?
= argmax
X
p(X )
m
Y
k=1
p(z
k
|X ) =
argmax
X
p(X )
m
Y
k=1
p(z
k
|X
k
) (2)
where, on the right-hand-side, we noticed that z
k
only depends
on the subset of variables in X
k
.
Problem (2) can be interpreted in terms of inference over a
factors graph [143]. The variables correspond to nodes in the
factor graph. The terms p(z
k
|X
k
) and the prior p(X ) are called
factors, and they encode probabilistic constraints over a subset
of nodes. A factor graph is a graphical model that encodes
the dependence between the k-th factor (and its measurement
z
k
) and the corresponding variables X
k
. A first advantage of
the factor graph interpretation is that it enables an insightful
Fig. 3: SLAM as a factor graph: Blue circles denote robot poses at
consecutive time steps (x
1
, x
2
, . . .), green circles denote landmark positions
(l
1
, l
2
, . . .), red circle denotes the variable associated with the intrinsic
calibration parameters (K). Factors are shown as black squares: the label
“u” marks factors corresponding to odometry constraints, “v” marks factors
corresponding to camera observations, “c” denotes loop closures, and “p”
denotes prior factors.
visualization of the problem. Fig. 3 shows an example of a
factor graph underlying a simple SLAM problem. The figure
shows the variables, namely, the robot poses, the landmark
positions, and the camera calibration parameters, and the
factors imposing constraints among these variables. A second
advantage is generality: a factor graph can model complex
inference problems with heterogeneous variables and factors,
and arbitrary interconnections. Furthermore, the connectivity
of the factor graph in turn influences the sparsity of the
resulting SLAM problem as discussed below.
In order to write (2) in a more explicit form, assume that
the measurement noise
k
is a zero-mean Gaussian noise with
information matrix Ω
k
(inverse of the covariance matrix).
Then, the measurement likelihood in (2) becomes:
p(z
k
|X
k
) ∝ exp(−
1
2
||h
k
(X
k
) − z
k
||
2
Ω
k
) (3)
where we use the notation ||e||
2
Ω
= e
T
Ωe. Similarly, assume
that the prior can be written as: p(X ) ∝ exp(−
1
2
||h
0
(X ) −
z
0
||
2
Ω
0
), for some given function h
0
(·), prior mean z
0
, and
information matrix Ω
0
. Since maximizing the posterior is
the same as minimizing the negative log-posterior, the MAP
estimate in (2) becomes:
X
?
= argmin
X
−log
p(X )
m
Y
k=1
p(z
k
|X
k
)
!
=
argmin
X
m
X
k=0
||h
k
(X
k
) − z
k
||
2
Ω
k
(4)
which is a nonlinear least squares problem, as in most prob-
lems of interest in robotics, h
k
(·) is a nonlinear function.
Note that the formulation (4) follows from the assumption of
Normally distributed noise. Other assumptions for the noise
distribution lead to different cost functions; for instance, if the
noise follows a Laplace distribution, the squared `
2
-norm in (4)
is replaced by the `
1
-norm. To increase resilience to outliers, it
is also common to substitute the squared `
2
-norm in (4) with
robust loss functions (e.g., Huber or Tukey loss) [112].
The computer vision expert may notice a resemblance
between problem (4) and bundle adjustment (BA) in Structure
from Motion [244]; both (4) and BA indeed stem from a
maximum a posteriori formulation. However, two key features