成本敏感等级学习：视觉显著性估计新方法

17 浏览量更新于2024-08-26 收藏 292KB PDF 举报

"本文介绍了一种利用阳性及未标记数据进行成本敏感的等级学习方法，应用于视觉显着性估计。该方法创新地摒弃了传统上在视觉显着性估计中对正样本和负样本的明确选取，转而将阳性与未标记数据直接整合进一个成本敏感的等级学习框架中。此框架能够同时考虑局部视觉特性与成对上下文的影响，从而提高预测的准确性和效果。实验结果显示，所提出的算法在视觉显着性估计上显著优于当前的先进方法。" 在视觉信息处理领域，视觉显着性（Visual Saliency）是指图像中吸引人类视觉注意力的部分，它通常与图像中的显著特征、色彩对比或运动区域相关。传统的机器学习方法在估计视觉显着性时，常常需要人工标注大量正样本（即显著区域）和负样本（非显著区域），这既耗时又昂贵。文章提出的成本敏感等级学习方法解决了这个问题。成本敏感学习（Cost-sensitive Learning）是一种处理不均衡数据集的机器学习策略，它考虑了不同类别样本的成本差异。在本文中，作者将这一理念应用到等级学习（Rank Learning）框架中。等级学习是排序学习的一种形式，其目标是预测样本之间的相对顺序，而非具体的类别标签。在视觉显着性估计中，这意味着算法不仅关注预测出哪些区域是显著的，还要正确排列这些显著区域的优先级。通过结合阳性（已知显著）和未标记数据，该方法能够在没有完整负样本信息的情况下进行学习。未标记数据在成本敏感的等级学习中扮演了重要角色，因为它们可以被视为潜在的负样本，或者帮助优化模型的边界，以减少误报非显著区域的可能性。实验部分，作者对比了他们的方法与几款最新的视觉显着性估计算法，结果显示，提出的成本敏感等级学习算法在多个评估指标上表现出显著的性能提升，这验证了其有效性和优越性。此外，这种无需显式区分正负样本的学习方式也降低了对大规模人工标注数据的依赖，有助于提高实际应用的效率和可行性。关键词：成本敏感，阳性与未标记数据，等级学习，视觉显着性。本文提供的成本敏感等级学习方法为视觉显着性估计带来了新的视角，通过简化样本选取过程并充分利用所有可用数据，提升了模型的预测能力和泛化能力，对于未来视觉信息处理的研究具有重要参考价值。

IEEE SIGNAL PROCESSING LETTERS, VOL. 17, NO. 6, JUNE 2010 591

Cost-Sensitive Rank Learning From Positive and

Unlabeled Data for Visual Saliency Estimation

Jia Li, Yonghong Tian, Member, IEEE, Tiejun Huang, Member, IEEE, and Wen Gao, Fellow, IEEE

Abstract—This paper presents a cost-sensitive rank learning

approach for visual saliency estimation. This approach avoids

the explicit selection of positive and negative samples, which is

often used by existing learning-based visual saliency estimation

approaches. Instead, both the positive and unlabeled data are

directly integrated into a rank learning framework in a cost-sen-

sitive manner. Compared with existing approaches, the rank

learning framework can take the inﬂuences of both the local visual

attributes and the pair-wise contexts into account simultaneously.

Experimental results show that our algorithm outperforms sev-

eral state-of-the-art approaches remarkably in visual saliency

estimation.

Index Terms—Cost-sensitive, positive and unlabeled data, rank

learning, visual saliency.

I. INTRODUCTION

ROM the perspective of signal processing, visual saliency

refers to the selection mechanism to pop-out the “impor-

tant” content from the input visual stimuli. With visual saliency,

the limited computational resource can be allocated to the de-

sired targets while the distractors can be ignored. Therefore, the

central issue in visual saliency estimation is to distinguish the

targets from the distractors using the various visual clues.

Often, visual saliency estimation requires the integration

of the bottom-up and top-down factors [1]. In existing works,

the bottom-up factor is usually treated as a stimuli-driven

component that determines visual saliency by detecting unique

or rare visual subsets in a scene. Inspired by the Feature In-

tegration Theory [2], many bottom-up approaches estimated

visual saliency by binding the irregularities in different visual

attributes. For example, Itti

et al. [3] presented an approach

to estimate image saliency by integrating intensity, color and

orientation contrasts. By incorporating motion and ﬂicker

contrasts, the same approach was extended to video saliency

Manuscript received February 09, 2010; revised April 06, 2010. Date of pub-

lication April 12, 2010; date of current version May 05, 2010. This work was

supported by grants from the Chinese National Natural Science Foundation

under Contracts 60973055 and 90820003, and by the National Basic Research

Program of China under Contract 2009CB320906. The associate editor coordi-

nating the review of this manuscript and approving it for publication was Dr.

Patrizio Campisi.

J. Li is with the Key Lab of Intelligent Information Processing, Institute of

Computing Technology, Chinese Academy of Sciences (CAS), Beijing, China,

and also with the Graduate University of CAS, Beijing, China.

Y. Tian, T. Huang and W. Gao are with the National Engineering Laboratory

for Video Technology, Peking University, Beijing, China (e-mail: yhtian@pku.

edu.cn).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/LSP.2010.2048049

in [4]. Harel et al. [5] represented each scene with a directed

graph and adopted a random walker to select the salient lo-

cations corresponding to the less visited nodes. In [6], Marat

et al. presented a biology-inspired model by simulating the

ﬁltering mechanism of the retinal cells to estimate spatiotem-

poral saliency. Similarly, many other approaches detected

irregularities in the spatiotemporal domain (e.g., [7]–[9]), in the

amplitude spectrum (e.g., [10]) or in the phase spectrum ([11]).

These irregularities were then integrated in an ad-hoc manner

to locate the salient target. However, such an ad-hoc integration

may not always work since the top-down factor also plays a

crucial role in visual saliency estimation. Often, the top-down

factor can be treated as priors to guide the integration process.

For example, Peters and Itti [12] proposed an approach to infer

a projection matrix from global scene characteristics to saliency

maps. Kienzle et al. [13] presented a non-parametric saliency

model by using the Support Vector Machine. Navalpakkam

and Itti [14] adopted a learning-based algorithm to pop-out the

targets and suppress the distractors through maximizing the

signal-noise-ratio. Generally speaking, these approaches can

achieve promising results but still have some drawbacks. Often,

the user data such as eye traces can only provide sparse positive

samples. That is, only a few locations in a scene are labeled

as positive, while most of other locations in the scene remain

unlabeled. These unlabeled data may contain many positive

samples so that it is improper to treat all of them as negative

samples (e.g., as in [12] and [13]), or randomly select negative

samples from them (e.g., as in [13]). Moreover, the inﬂuence

of pair-wise context (e.g., the competition between targets and

distractors [3], [4], the co-occurrence characteristics of various

visual stimuli [15]) is not considered in these approaches, which

also plays an important role in visual saliency estimation.

To solve these two problems, we propose a cost-sensitive

rank learning approach on positive and unlabeled data for visual

saliency estimation. In our approach, the inﬂuences of local vi-

sual attributes and pair-wise contexts are taken into account si-

multaneously using a pair-wise rank learning framework. More-

over, we avoid the explicit extraction of positive and negative

samples by directly integrating both the positive and unlabeled

data into the optimization objective in a cost-sensitive manner.

Extensive experiments demonstrate that our approach outper-

forms several state-of-the-art bottom-up (e.g., [3]–[5], [7], [8],

[10], [11]) and top-down (e.g., [12]–[14]) approaches in visual

saliency estimation. Moreover, both the cost-sensitive integra-

tion of positive and unlabeled data and the rank learning frame-

work are proved to be helpful in visual saliency estimation.

The remainder of this paper is organized as follows.

Section II describes the cost-sensitive rank learning approach

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38714641

粉丝: 2
资源: 948

成本敏感等级学习：视觉显著性估计新方法

成本敏感型学习：依赖于示例的成本敏感型学习的混合整数线性规划方法

李白高力士脱靴李白贺知章告别课本剧.pptx

Spring Cloud 学习过程记录，含多方面知识及系列教程.zip

C语言项目之超级万年历系统源码.zip

Jupyter_OReilly书的代码存储库.zip

51单片机加减乘除计算器系统设计（proteus8.17,keil5），复制粘贴就可以运行

《中国房地产统计年鉴》面板数据资源-精心整理.zip

Jupyter_自动驾驶规划控制python代码实现.zip

Jupyter_我的Datawhale组队学习在线阅读地址.zip

学术答辩动态PPT-1-18套.rar

最新资源