IEEE SIGNAL PROCESSING LETTERS, VOL. 17, NO. 6, JUNE 2010 591
Cost-Sensitive Rank Learning From Positive and
Unlabeled Data for Visual Saliency Estimation
Jia Li, Yonghong Tian, Member, IEEE, Tiejun Huang, Member, IEEE, and Wen Gao, Fellow, IEEE
Abstract—This paper presents a cost-sensitive rank learning
approach for visual saliency estimation. This approach avoids
the explicit selection of positive and negative samples, which is
often used by existing learning-based visual saliency estimation
approaches. Instead, both the positive and unlabeled data are
directly integrated into a rank learning framework in a cost-sen-
sitive manner. Compared with existing approaches, the rank
learning framework can take the influences of both the local visual
attributes and the pair-wise contexts into account simultaneously.
Experimental results show that our algorithm outperforms sev-
eral state-of-the-art approaches remarkably in visual saliency
estimation.
Index Terms—Cost-sensitive, positive and unlabeled data, rank
learning, visual saliency.
I. INTRODUCTION
F
ROM the perspective of signal processing, visual saliency
refers to the selection mechanism to pop-out the “impor-
tant” content from the input visual stimuli. With visual saliency,
the limited computational resource can be allocated to the de-
sired targets while the distractors can be ignored. Therefore, the
central issue in visual saliency estimation is to distinguish the
targets from the distractors using the various visual clues.
Often, visual saliency estimation requires the integration
of the bottom-up and top-down factors [1]. In existing works,
the bottom-up factor is usually treated as a stimuli-driven
component that determines visual saliency by detecting unique
or rare visual subsets in a scene. Inspired by the Feature In-
tegration Theory [2], many bottom-up approaches estimated
visual saliency by binding the irregularities in different visual
attributes. For example, Itti
et al. [3] presented an approach
to estimate image saliency by integrating intensity, color and
orientation contrasts. By incorporating motion and flicker
contrasts, the same approach was extended to video saliency
Manuscript received February 09, 2010; revised April 06, 2010. Date of pub-
lication April 12, 2010; date of current version May 05, 2010. This work was
supported by grants from the Chinese National Natural Science Foundation
under Contracts 60973055 and 90820003, and by the National Basic Research
Program of China under Contract 2009CB320906. The associate editor coordi-
nating the review of this manuscript and approving it for publication was Dr.
Patrizio Campisi.
J. Li is with the Key Lab of Intelligent Information Processing, Institute of
Computing Technology, Chinese Academy of Sciences (CAS), Beijing, China,
and also with the Graduate University of CAS, Beijing, China.
Y. Tian, T. Huang and W. Gao are with the National Engineering Laboratory
for Video Technology, Peking University, Beijing, China (e-mail: yhtian@pku.
edu.cn).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LSP.2010.2048049
in [4]. Harel et al. [5] represented each scene with a directed
graph and adopted a random walker to select the salient lo-
cations corresponding to the less visited nodes. In [6], Marat
et al. presented a biology-inspired model by simulating the
filtering mechanism of the retinal cells to estimate spatiotem-
poral saliency. Similarly, many other approaches detected
irregularities in the spatiotemporal domain (e.g., [7]–[9]), in the
amplitude spectrum (e.g., [10]) or in the phase spectrum ([11]).
These irregularities were then integrated in an ad-hoc manner
to locate the salient target. However, such an ad-hoc integration
may not always work since the top-down factor also plays a
crucial role in visual saliency estimation. Often, the top-down
factor can be treated as priors to guide the integration process.
For example, Peters and Itti [12] proposed an approach to infer
a projection matrix from global scene characteristics to saliency
maps. Kienzle et al. [13] presented a non-parametric saliency
model by using the Support Vector Machine. Navalpakkam
and Itti [14] adopted a learning-based algorithm to pop-out the
targets and suppress the distractors through maximizing the
signal-noise-ratio. Generally speaking, these approaches can
achieve promising results but still have some drawbacks. Often,
the user data such as eye traces can only provide sparse positive
samples. That is, only a few locations in a scene are labeled
as positive, while most of other locations in the scene remain
unlabeled. These unlabeled data may contain many positive
samples so that it is improper to treat all of them as negative
samples (e.g., as in [12] and [13]), or randomly select negative
samples from them (e.g., as in [13]). Moreover, the influence
of pair-wise context (e.g., the competition between targets and
distractors [3], [4], the co-occurrence characteristics of various
visual stimuli [15]) is not considered in these approaches, which
also plays an important role in visual saliency estimation.
To solve these two problems, we propose a cost-sensitive
rank learning approach on positive and unlabeled data for visual
saliency estimation. In our approach, the influences of local vi-
sual attributes and pair-wise contexts are taken into account si-
multaneously using a pair-wise rank learning framework. More-
over, we avoid the explicit extraction of positive and negative
samples by directly integrating both the positive and unlabeled
data into the optimization objective in a cost-sensitive manner.
Extensive experiments demonstrate that our approach outper-
forms several state-of-the-art bottom-up (e.g., [3]–[5], [7], [8],
[10], [11]) and top-down (e.g., [12]–[14]) approaches in visual
saliency estimation. Moreover, both the cost-sensitive integra-
tion of positive and unlabeled data and the rank learning frame-
work are proved to be helpful in visual saliency estimation.
The remainder of this paper is organized as follows.
Section II describes the cost-sensitive rank learning approach
1070-9908/$26.00 © 2010 IEEE