4
spanning tree of R. Considering edges in non-decreasing
order by weight, each step of the algorithm merges
components R
1
and R
2
connected by the current edge if
the edge weight is less than:
min(Int(R
1
)+τ(R
1
),Int(R
2
)+τ(R
2
)) (1)
where τ (R)=k/|R|. k is a scale parameter that can be
used to set a preference for component size.
The Mean Shift algorithm [34] offers an alternative
clustering framework. Here, pixels are represented in
the joint spatial-range domain by concatenating their
spatial coordinates and color values into a single vector.
Applying mean shift filtering in this domain yields a
convergence point for each pixel. Regions are formed by
grouping together all pixels whose convergence points
are closer than h
s
in the spatial domain and h
r
in the
range domain, where h
s
and h
r
are respective bandwidth
parameters. Additional merging can also be performed
to enforce a constraint on minimum region area.
Spectral graph theory [48], and in particular the Nor-
malized Cuts criterion [45], [46], provides a way of
integrating global image information into the grouping
process. In this framework, given an affinity matrix W
whose entries encode the similarity between pixels, one
defines diagonal matrix D
ii
=
j
W
ij
and solves for the
generalized eigenvectors of the linear system:
(D − W )v = λDv (2)
Traditionally, after this step, K-means clustering is
applied to obtain a segmentation into regions. This ap-
proach often breaks uniform regions where the eigenvec-
tors have smooth gradients. One solution is to reweight
the affinity matrix [47]; others have proposed alternative
graph partitioning formulations [49], [50], [51].
A recent variant of Normalized Cuts for image seg-
mentation is the Multiscale Normalized Cuts (NCuts)
approach of Cour et al. [33]. The fact that W must
be sparse, in order to avoid a prohibitively expensive
computation, limits the naive implementation to using
only local pixel affinities. Cour et al. solve this limitation
by computing sparse affinity matrices at multiple scales,
setting up cross-scale constraints, and deriving a new
eigenproblem for this constrained multiscale cut.
Sharon et al. [52] propose an alternative to improve
the computational efficiency of Normalized Cuts. This
approach, inspired by algebraic multigrid, iteratively
coarsens the original graph by selecting a subset of nodes
such that each variable on the fine level is strongly
coupled to one on the coarse level. The same merging
strategy is adopted in [31], where the strong coupling of
a subset S of the graph nodes V is formalized as:
j∈S
p
ij
j∈V
p
ij
>ψ ∀i ∈ V − S (3)
where ψ is a constant and p
ij
the probability of merging
i and j, estimated from brightness and texture similarity.
Many approaches to image segmentation fall into a
different category than those covered so far, relying on
the formulation of the problem in a variational frame-
work. An example is the model proposed by Mumford
and Shah [53], where the segmentation of an observed
image u
0
is given by the minimization of the functional:
F(u, C)=
Ω
(u − u
0
)
2
dx + μ
Ω\C
|∇(u)|
2
dx + ν|C| (4)
where u is piecewise smooth in Ω\C and μ, ν are weight-
ing parameters. Theoretical properties of this model can
be found in, e.g. [53], [54]. Several algorithms have been
developed to minimize the energy (4) or its simplified
version, where u is piecewise constant in Ω\C. Koepfler
et al. [55] proposed a region merging method for this
purpose. Chan and Vese [56], [57] follow a different
approach, expressing (4) in the level set formalism of
Osher and Sethian [58], [59]. Bertelli et al. [30] extend
this approach to more general cost functions based on
pairwise pixel similarities. Recently, Pock et al. [60] pro-
posed to solve a convex relaxation of (4), thus obtaining
robustness to initialization. Donoser et al. [29] subdivide
the problem into several figure/ground segmentations,
each initialized using low-level saliency and solved by
minimizing an energy based on Total Variation.
2.3 Benchmarks
Though much of the extensive literature on contour
detection predates its development, the BSDS [2] has
since found wide acceptance as a benchmark for this task
[23], [24], [25], [26], [27], [28], [35], [61]. The standard for
evaluating segmentations algorithms is less clear.
One option is to regard the segment boundaries
as contours and evaluate them as such. However, a
methodology that directly measures the quality of the
segments is also desirable. Some types of errors, e.g. a
missing pixel in the boundary between two regions, may
not be reflected in the boundary benchmark, but can
have substantial consequences for segmentation quality,
e.g. incorrectly merging large regions. One might argue
that the boundary benchmark favors contour detectors
over segmentation methods, since the former are not
burdened with the constraint of producing closed curves.
We therefore also consider various region-based metrics.
2.3.1 Variation of Information
The Variation of Information metric was introduced for
the purpose of clustering comparison [6]. It measures the
distance between two segmentations in terms of their
average conditional entropy given by:
VI(S, S
)=H(S)+H(S
) − 2I(S, S
) (5)
where H and I represent respectively the entropies and
mutual information between two clusterings of data S
and S
. In our case, these clusterings are test and ground-
truth segmentations. Although VI possesses some inter-
esting theoretical properties [6], its perceptual meaning
and applicability in the presence of several ground-truth
segmentations remains unclear.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.