3
that the watershed transform from seeds is equivalent to a cut
in a minimum-spanning tree (MST). That is, the removal of
the arc with maximum weight from the single path in the MST
that connects each pair of seeds results a minimum-spanning
forest (i.e., a watershed cut). Such a graph cut tends to be better
than the normalized cut in boundary adherence, but worse in
superpixel regularity.
In the evolution of superpixel segmentation methods, it
is also worth mentioning Mean-Shift [40], Quick-Shift [41],
turbopixels [42], SLIC [1], geometric flow [43], LSC [15],
and DBSCAN [21]. The Mean-Shift method produces irregular
and loose superpixels whereas the Quick-Shift algorithm does
not allow an user to choose the number of superpixels. The
turbopixel-based approaches can produce good superpixels,
but are computationally complex. C¸ iˇgla and Atalan [44] used
connected k-means algorithm with convexity constraints to
achieve superpixel segmentation via speeded-up turbopixels.
The method is still bit slow, and, as claimed by the authors,
fails to provide good boundary recall for complex images.
SLIC is by far the most commonly used superpixel method
[1]. It uses a regular grid for seed sampling. Once chosen, the
seeds are transferred to the lowest gradient position within a
small neighborhood. Finally, a modified k-means algorithm is
used to cluster the remaining pixels. This algorithm was shown
to perform better than many other methods (e.g., [42], [13],
[22], [23], [41]). However, the k-means algorithm searches for
pixels within a 2S × 2S window around each seed, where
S is the grid interval. For a non-regular seed distribution,
some pixels may not be reached by any seed. Indeed, this
might happen from the second iteration on and this labeling
inconsistency problem is only solved by post-processing. In
[43], Wang et al. proposed a geometric-flow-based method
of superpixel generation. The method has high computational
complexity as it involves computation of the geodesic distance
and several iterations. LSC [15] and DBSCAN [21] are among
the most recent approaches. LSC models the segmentation
problem using Normalized Cuts, but it applies an efficient
approximate solution using a weighted k-means algorithm to
generate superpixels. DBSCAN performs fast pixel grouping
based on color similarity with geometric restrictions, and then
merges small clusters to ensure connected superpixels.
A first method based on the ISF framework appeared in [45]
and has been successfully used in a high level application [33].
It is considered in our experiments.
III. THE ISF FRAMEWORK
An ISF method results from the choice of each compo-
nent: inital seed selection, connectivity function, adjacency
relation, and seed recomputation strategy. The ISF algorithm
is a sequence of Image Foresting Transforms (IFTs) from
improved seed pixel sets (Section III-A). For initial seed
selection, we propose either grid or mixed entropy-based seed
sampling as effective strategies (Section III-B). The closest
minima of a gradient image to seeds obtained by grid sampling
is also evaluated in an attempt to solve the problem in
a single iteration. Examples of connectivity functions and
adjacency relations for 2D and 3D segmentations are presented
in Sections III-C and III-D, respectively. Two strategies for
seed recomputation are described in Section III-E. The ISF
algorithm is presented in Section III-G and its theoretical
properties are demonstrated in the supplementary material.
Section III-H discusses implementation issues and provides
the link to the code.
A. Image Foresting Transform
An image can be interpreted as a graph G = (I, A), whose
pixels in the image domain I ⊂ Z
n
are the nodes and pixel
pairs (s, t) that satisfy the adjacency relation A ⊂ I ×I are
the arcs (e.g., 4-neighbors when n = 2). We use t ∈ A(s) and
(s, t) ∈ A to indicate that t is adjacent to s.
For a given image graph G = (I, A), a path π
t
=
ht
1
, t
2
, . . . , t
n
= ti is a sequence of adjacent pixels with ter-
minus t. A path is trivial when π
t
= hti. A path π
t
= π
s
·hs, ti
indicates the extension of a path π
s
by an arc (s, t). When we
want to explicitly indicate the origin of a path, the notation
π
s t
= ht
1
= s, t
2
, . . . , t
n
= ti is used, where s stands for
the origin and t for the destination node. A predecessor map is
a function P that assigns to each pixel t in I either some other
adjacent pixel in I, or a distinctive marker nil not in I — in
which case t is said to be a root of the map. A spanning forest
(image segmentation) is a predecessor map which contains no
cycles — i.e., one which takes every pixel to nil in a finite
number of iterations. For any pixel t ∈ I, a spanning forest
P defines a path π
P
t
recursively as hti if P (t) = nil, and
π
P
s
· hs, ti if P (t) = s 6= nil.
A connectivity (path-cost) function computes a value f(π
t
)
for any path π
t
, including trivial paths π
t
= hti. A path π
t
is optimum if f(π
t
) ≤ f (τ
t
) for any other path τ
t
in Π
G
(the set of paths in G). By assigning to each pixel t ∈ I one
optimum path with terminus t, we obtain an optimal mapping
C, which is uniquely defined by C(t) = min
∀π
t
in Π
G
{f(π
t
)}.
The Image Foresting Transform (IFT) [11] takes an image
graph G = (I, A), and a connectivity function f ; and assigns
one optimum path π
t
to every pixel t ∈ I such that an
optimum-path forest P is obtained — i.e., a spanning forest
where all paths are optimum. However, f must satisfy certain
conditions, as described in [46], otherwise, the paths may not
be optimum.
In ISF, all seeds are forced to be the roots of the forest by
choice of f, in order to obtain a desired number of superpixels.
For any given seed set S, each superpixel will be represented
by its respective tree in the spanning forest P as computed by
the IFT algorithm.
B. Seed Sampling Strategies
Any natural image contains a lot of heterogeneity. Some
parts of the image can have really small variations in inten-
sity whereas some parts in the image can show significant
variations. So, it is but natural to choose more seeds from
a more non-uniform region of an image. However, having a
grid structure for the seeds is also essential to conform to the
regularity of the superpixels. The proposed mixed sampling
strategy achieves both the goals. We use a two-level quad-
tree representation of an input 2D image. The heterogeneity