HUANG AND ZHANG: 300-frames/s SALIENT OBJECT DETECTION VIA MDC 4245
contrast also named as uniqueness is defined as Euclidean
color distance to entire image with Gaussian spatial weight:
S
SF
i
=
M
j=1
||c
i
− c
j
||
2
w
ij
= c
2
i
M
j=1
w
ij
−2c
i
M
j=1
c
j
w
ij
+
M
j=1
c
2
j
w
ij
(2)
where c
i
is the average color of the region or pixel r
i
.Since
w
ij
is a Gaussian spatial weight function, using Gaussian
blurring kernel on c
i
and c
2
i
, the computational complexity for
each pixel can be reduced from O(M) to O(1). Nevertheless,
Gaussian blurring still needs high time expenditure.
In above global contrast based work, saliency is measured
by the sum of contrast from the entire image, some background
regions also show high saliency due to large contrast contribu-
tion from the object. An example is shown in Fig. 1. (e)(f)(g),
the background pixel at yellow crossing and the foreground
pixel at red crossing demonstrate comparable saliency. In this
paper, we consider spatial distribution of contrast, and adopt
MDC as saliency metric, contrast contribution from object to
background can be suppressed.
Besides, RC [18] adopt graph-based segmentation [34]
which needs 60ms for each image, expensive time cost for
region segmentation limits speed performance.
The boundary and connectivity priors [21] are also shown
to be effective in salient object detection. These priors
assume that background regions are usually connected to the
image boundary. Geodesic distance [21] and Minimum Barrier
Distance (MBD) [46] are widely used distance transform to
measure regions connectivity to the image boundary. In [40],
authors propose one approximate MBD implementation with
raster scan. Three passes scan on each color channel also limit
the speed performance. Another approximation method on
minimum spanning tree (MST) in presented in [45], additional
time cost to build tree leads to worse speed performance
than [40]. Both these two algorithms show that MBD is more
robust to noise and blur than the geodesic distance.
Recently, deep learning has achieved great successes
in many computer vision tasks. Some researchers have
already applied deep neural networks to saliency detection.
Wang et al. [47] use a CNN to predict saliency for each pixel
in local context, then refine the saliency on object proposal
over the global view. Zhao et al. [48] consider global and
local context simultaneously in a multi-context CNN, then
combine them to predict saliency. Deep learning based works
both achieve better performance with very low speed.
III. S
ALIENT OBJECT DETECTION METHOD
In this section, we present an efficient salient object detec-
tion method. We first propose one raw saliency metric MDC
which considers spatial distribution of contrast. Next, an O(1)
implementation of MDC is proposed. In post processing,
saliency smoothing and enhancement will be introduced.
A. Minimum Directional Contrast (MDC)
To measure the saliency of a region or pixel, contrast is the
most frequently used feature. Global contrast is widely studied
in salient object detection which considers the color difference
between the target region or pixel and the entire image.
Global contrast can be calculated at pixel or region level.
Region level method needs higher time cost for image seg-
mentation. In RC [18], graph-based segmentation [34] needs
about 60ms for each image on MSRA-10K dataset [18] [33].
In SLIC superpixel segmentation based method SF [20],
segmentation needs about 110ms. In order to achieve higher
speed performance, we adopt pixel level saliency detection
which needs not time cost for region segmentation.
As discussed in introduction part, in previous global contrast
based methods, saliency is simply measured by the sum of
contrast from the entire image [18], [20], or defined as contrast
with the average image color [17]. Spatial distribution of
contrast is neglected.
In this paper, we further detail analyze contrast from differ-
ent spatial directions. If the target pixel i is regarded as the
center of view, the entire image can be divided into several
regions based on their location w.r.t. pixel i,i.e.,topleft(TL),
top right (TR), bottom left (BL), bottom right (BR). Directional
contrast (DC) from each region can be calculated as:
DC
i,
=
j∈
K
ch=1
(I
i,ch
− I
j,ch
)
2
(3)
where I denotes one input image with K color channels in
CIE-Lab color space. Fig. 3 (a) is one input image, two target
pixels are shown in Fig. 3 (b), one foreground pixel at the
red crossing in the top row, and one background pixel at the
yellow crossing in the bottom row. The entire image is simply
divided into four regions by red or yellow line. DC result of
two target pixels is shown in Fig. 3 (c).
From DC result in Fig. 3 (c), we can find the distribution
of DC differ greatly between foreground and background
pixel. In the top row, foreground pixel shows high DC in
almost all directions, minimum directional contrast (MDC)
still has high value. In the bottom row, background pixel
demonstrates very low DC in direction bottom right and high
DC in other directions, MDC will show very low value. More
generally, since foreground pixel is usually surrounded by the
background, it often has high contrast from all directions,
MDC shows high value. On the contrary, the MDC of a
background pixel is usually small, as it has to connect to the
background through one of the directions. This suggests that
we can define MDC which means the minimum contrast from
all directions as the raw saliency metric:
S(i) = min
∈ TL,TR,BL,BR
DC
i,
=
min
∈ TL,TR,BL,BR
(
j∈
K
ch=1
(I
i,ch
− I
j,ch
)
2
(4)
The MDC of two target pixels are shown in Fig. 3 (c),
the foreground pixel at red crossing shows obviously higher
MDC than the background pixel at yellow crossing. The MDC
based raw saliency of all pixels are shown in Fig. 3 (d).
In previous global contrast based methods (HC [18],
RC [18], SF [20]), saliency is simply defined as the sum of