Cummins and Newman / FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance 649
score between observations, then compute the pairwise simi-
larity between all observations to yield a square similarity ma-
trix. If the robot observes frequently as it moves through the
environment and observations are ordered by time, then loop
closures appear as off-diagonal stripes in this matrix. Levin
and Szeliski (2004) describe such a method that uses a cascade
of similarity functions based on global color histograms, im-
age moments, and epipolar geometry. Silpa-Anan and Hartley
(2004, 2005) describe a similar system which employs SIFT
features. Zi vkovic et al. (2005) consider the case where the
order in which the images were collected is unkno wn, and
use graph cuts to identify clusters of related images. A se-
ries of papers by Ho and Newman (2005b,a)1 Newman and Ho
(2005)1 Newman et al. (2006) advance similarity-matrix-based
techniques by considering the problem of perceptual aliasing.
Their approach is based on a singular value decomposition
of the similarity matrix that eliminates the effect of repetitive
structure. The y also address the issue of deciding if a putative
match is indeed a loop closure based on examining an extreme
value distribution (Gumbel 1958) related to the similarity ma-
trix.
This paper will describe a new appearance-based technique
that improves on these existing results, particularly by address-
ing perceptual aliasing in a probabilistic frame work. The core
of our approach we have previously described in Cummins
and New man (2007)1 here we expand on that presentation and
present a detailed evaluation of the system’s performance.
3. Approximating High-dimensional Discrete
Distributions
This section introduces some background material that forms a
core part of our system. Readers familiar with Chow Liu trees
maywishtoskiptoSection4.
Consider a distribution P1Z2 on n discrete variables, Z 2
3z
1
3 z
2
34443z
n
4. We wish to learn the parameters of this distri-
bution from a set of samples. If P1Z2 is a general distribution
without special structure, the space needed to represent the
distribution is exponential in n—as n increases dealing with
such distributions quickly becomes intractable. A solution to
this problem is to approximate P1Z 2 by another distribution
Q1Z2 possessing some special structure that makes it tractable
to work with, and that is in some sense similar to the original
distribution. The natural similarity metric in this context is the
Kullback–Leibler div ergence, defined as
D
KL
1P3 Q2 2
1
x5X
P1x2 log
P1x2
Q1x2
3 (1)
where the summation is carried out over all possible states in
the distribution. The KL diver gence is zero when the distribu-
tions are identical, and strictly larger otherwise.
The naive Bayes approximation is an example of a struc-
tural constraint that allows for tractable learning of Q1Z2, un-
der which Q1Z2 is restricted such that each variable must be
independent of all others. This extreme structural constraint
limits the degree to which Q1Z 2 can represent P1Z2.Aless
severe constraint is to require each variable in Q1Z2 to be con-
ditioned on at most one other variable—this restricts the graph-
ical model of Q1Z 2 to be a tree. Unlike the naive Bayes case,
where there is only one possible graphical model, in this case
there are n
n62
possible tree-structured distributions that sat-
isfy the structural constraint. Chow and Liu (1968) described a
polynomial time algorithm to select the best such distribution,
whichweelaborateonbelow.
More generally, we could constrain Q1Z 2 such that each
variable is conditioned on at most n other variables. This struc-
ture is known as a thin junction tree.ItwasshownbySrebro
(2001) that the problem of determining the optimal thin junc-
tion tree is NP-hard, though there are several methods to find
a good approximate solution (for example, Bach and Jordan
(2002)).
3.1. Cho w Liu Trees
The Chow Liu algorithm approximates a discrete distribution
P1Z 2 by the closest tree-structured Bayesian network Q1Z2
opt
,
in the sense of minimizing the Kullback–Leibler divergence.
The structure of Q1Z2
opt
is determined by considering a mu-
tual information graph 1. For a distribution over n variables,
1 is the complete graph with n nodes and 1n1n 6 12252 edges,
where each edge 1z
i
3 z
j
2 has weight equal to the mutual infor-
mation I 1z
i
3 z
j
2 between variable i and j,
I 1z
i
3 z
j
2 2
1
z
i
563z
j
56
p1z
i
3 z
j
2 log
p1z
i
3 z
j
2
p1z
i
2p1z
j
2
3 (2)
and the summation is carried out over all possible states of z.
Mutual information measures the degree to which knowledge
of the value of one variable predicts the value of another. It is
zero if two variables are independent, and strictly larger other-
wise.
Chow and Liu prove that the maximum-weight spanning
tree (Cormen et al. 2001) of the mutual information graph will
havethesamestructureasQ1Z 2
opt
(see Figure 1). Intuitively,
the dependencies between variables which are omitted from
Q1Z2
6opt
have as little mutual information as possible, and so
are the best ones to approximate as independent. Chow and
Liu further prove that the conditional probabilities p1z
i
7z
j
2 for
each edge in Q1Z2
opt
are the same as the conditional probabil-
ities in P1Z 2—thus maximum likelihood estimates can be ob-
tained directly from co-occurrence frequency in training data.
To mitigate potential problems due to limited training data, we
use the pseudo-Bayesian p
8
estimator described in Bishop et
al. (1977), rather than the maximum likelihood estimator. This
© 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
at Oxford University Libraries on June 5, 2008 http://ijr.sagepub.comDownloaded from