J. Ning et al. / Pattern Recognition 43 (2010) 445 -- 456 447
The RGB/Bhattacharyya descriptor is a very simple yet efficient
way to represent the regions and measure their similarity. It has
been successfully used to measure the similarity between target
model and candidate model in the popular kernel based object track-
ing method [27]. However, it should be stressed that other color
spaces, such as the HSI color space, and other distance measures,
such as the Euclidean distance between histogram vectors, can also
be adopted in the proposed region merging scheme. In Section 3.3,
we present examples by using HSI color space and Euclidean dis-
tance, respectively. The results are similar to those by using the RGB/
Bhattacharyya descriptor.
2.2. Object and background marking
In the interactive image segmentation, the users need to specify
the object and background conceptually. Similar to [10,13,17], the
users can input interactive information by drawing markers, which
could be lines, curves and strokes on the image. The regions that
have pixels inside the object markers are thus called object marker
regions, while the regions that have pixels inside the background
markers are called background marker regions. Fig. 1b shows exam-
ples of the object and background markers by using simple lines.
We use green markers to mark the object while using blue markers
to represent the background. Please note that usually only a small
portion of the object regions and background regions will be marked
by the user. Actually, the less the required inputs by the users, the
more convenient and more robust the interactive algorithm is.
After object marking, each region will be labeled as one of three
kinds of regions: the marker object region, the marker background
region and the non-marker region. To completely extract the object
contour, we need to automatically assign each non-marker region
with a correct label of either object region or background region.
For the convenience of the following development, we denote by M
o
and M
B
the sets of marker object regions and marker background
regions, respectively, and denote by N the set of non-marker regions.
2.3. Maximal similarity based merging rule
After object/background marking, it is still a challenging problem
to extract accurately the object contour from the background because
only a small portion of the object/background features are indicated
by the user. The conventional region merging methods merge two
adjacent regions whose similarity is above a preset threshold [14,
Chapter 6.3]. These methods have difficulties in adaptive threshold
selection. A big threshold will lead to incomplete merging of the
regions belonging to the object, while a small threshold can easily
cause over-merging, i.e. some object regions are merged into the
background. Moreover, it is difficult to judge when the region merg-
ing process should stop.
Object and background markers provide some key features of
object and background, respectively. Similar to graph cut and marker
based watershed [4], where the marker is the seed and starting point
of the algorithm, the proposed region merging method also starts
from the initial marker regions and all the non-marker regions will
be gradually labeled as either object region or background region.
The lazy snapping cutout method proposed in [17], which combines
graph cut with watershed based initial segmentation, is actually a
region merging method. It is controlled by a max-flow algorithm
[11]. In this paper, we present an adaptive maximal similarity based
merging mechanism to identify all the non-marker regions under
the guidance of object and background markers.
Let Q be an adjacent region of R and denote by
¯
S
Q
={S
Q
i
}
i=1,2,...,q
the set of Q's adjacent regions. The similarity between Q and all its
adjacent regions, i.e.
(Q, S
Q
i
), i = 1,2, ... ,q, are calculated. Obviously,
R is a member of
¯
S
Q
. If the similarity between R and Q is the maximal
one among all the similarities
(Q, S
Q
i
), we will merge R and Q.The
following merging rule is defined:
Merge R and Q if
(R, Q) = max
i=1,2,...,q
(Q, S
Q
i
)(2)
The merging rule (2) is very simple but it establishes the basis of
the proposed region merging process. One important advantage of
(2) is that it avoids the presetting of similarity threshold for merging
control. Although “max” is an operator that is sensitive to outliers,
we empirically found that it works well in our algorithm. This is
mainly because that the histogram is a global descriptor of the local
region and it is robust to noise and small variations. Meanwhile, the
Bhattacharyya coefficient is the inner product of the two histogram
vectors and it is also robust to noise and variations.
The marker regions cover only a small part of the object and back-
ground. Those object regions that are not marked by the user, i.e. the
non-marker object regions, should be identified and not be merged
with the background. Since they are from the same object, the non-
marker object regions will usually have higher similarity with the
marker object regions than the background regions. Therefore, in the
automatic region merging process, the non-marker object regions
will have high probabilities to be identified as object.
2.4. The merging process
The whole MSRM process can be divided into two stages, which
are repeatedly executed until no new merging occurs. Our strategy
is to merge background regions as many as possible while keep ob-
ject regions from being merged. Once we merge all the background
regions, it is equivalent to extracting the desired object. Some two-
step strategies have been used in [22,23] for image pyramid con-
struction. Different from [22,23], the proposed strategy aims for
image segmentation and it is guided by the markers input by users.
In the first stage, we try to merge marker background regions
with their adjacent regions. For each region B ∈ M
B
, we form the set
of its adjacent regions
¯
S
B
={A
i
}
i=1,2,...,r
. Then for each A
i
and A
i
/∈ M
B
,
we form its set of adjacent regions
¯
S
A
i
={S
A
i
j
}
j=1,2,...,k
. It is obvious
that B ∈
¯
S
A
i
. The similarity between A
i
and each element in
¯
S
A
i
, i.e.
(A
i
, S
A
i
j
), is calculated. If B and A
i
satisfy the rule (2), i.e.
(A
i
, B) = max
j=1,2,...,k
(A
i
, S
A
i
j
)(3)
then B and A
i
are merged into one region and the new region will
have the same label as region B:
B = B ∪ A
i
(4)
Otherwise, B and A
i
will not merge.
The above procedure is iteratively implemented. Note that in
each iteration, the sets M
B
and N will be updated. Specifically, M
B
expands and N shrinks. The iteration stops when the entire marker
background regions M
B
will not find new merging regions.
After the region merging of this stage, some non-marker back-
ground regions will be merged with the corresponding background
markers. However, there are still non-marker background regions
which cannot be merged because they have higher similarity scores
with each other than with the marker background regions. Fig. 2a
shows that after the first stage merging, many regions belonging to
the background (leaves, branches, etc.) are merged but there are still
some non-marker background regions left.
To complete the task of target object extraction, in the second
stage we will focus on the non-marker regions in N remained from
the first stage. Part of N belongs to the background, while part of