没有合适的资源?快使用搜索试试~ 我知道了~
首页双低秩追求:深度学习显著特征的 saliency detection
双低秩追求:深度学习显著特征的 saliency detection
0 下载量 145 浏览量
更新于2024-08-26
收藏 1.47MB PDF 举报
本文主要探讨了"双重低级别追求:学习显着性检测的显着特征"这一主题,发表在IEEE Transactions on Neural Networks and Learning Systems的某期Vol.XX上,时间是MONTHYEAR。该研究关注的是机器视觉中的一个重要步骤——显著性检测,即预测人们在自由浏览自然图像时的注视点。作者Congyan Lang、Jiashi Feng、Songhe Feng、Jingdong Wang(IEEE成员)和Shuicheng Yan(IEEE高级成员)提出了一个新颖的算法,名为Dual Low Rank Pursuit (DLRP),旨在解决这一特定问题。 DLRP方法的核心在于利用现有的监督信息来学习对显著性有意识的特征转换。它结合了低秩和稀疏性追求的流行框架,通过构建区分性的基底,有效地检测出人类的注视点,避免了先前工作中昂贵的对象分割步骤。这种监督学习过程嵌入了高层信息,使得DLRP能够在不进行繁琐分割的情况下,准确地预测注视点,从而展现出显著的优势。 实验部分展示了DLRP相对于当前主流方法的优越性,通过一系列全面的对比测试,证明了其在显著性检测任务中的高效性和准确性。这项工作对于提高计算机理解视觉世界的能力,特别是在无监督或弱监督情况下,具有重要的理论和实际价值,有助于推动该领域的研究和发展。因此,这篇论文不仅提供了新的算法策略,也为后续的研究者提供了一个可参考的基准,对于提升显著性检测算法的性能具有深远影响。
资源详情
资源推荐
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. XX, MONTH YEAR 3
(a) Input Image (b) Ground Truth (c) SVM (d) MTSP (e) Our result
Fig. 2. Examples of predicted fixation maps, for an image shown in (a),
from different methods: (c) SVM based [19], (d) MTSP [25] and (e) our
proposed DLRP. DLRP predicts a more accurate fixation map. The ground
truth is provided in (b) for reference.
instance, a multi-task sparsity pursuit based method – termed
as MTSP – was proposed in [25], which combines multiple
features for fixation prediction. That work also suggests that
one non-salient background region can be linearly represented
by other background regions due to the underlying low-
rank structure. Further a generalized MTSP (GMTSP) was
described in [25] to incorporate the top-down priors in
the supervised environment. Although GMTSP also considers
some kind of high-level information, the bottom-up and high-
level components are treated separately. Recent methods also
consider the sparsity prior in the salient object detection task.
Peng et al. [42] proposed to extract sparse components from
an image as the detected salient regions. With a similar spirit,
Shen et al. [32] applied a robust principal component analysis
(RPCA) method [43] to factorize a segmented image into few
salient regions plus non-salient background of low-rankness.
In addition, Shen et al. also proposed to learn a new feature
space, which is more favorable than original feature space
for saliency detection via RPCA, under the supervision of
provided saliency map annotations.
Our proposed DLRP method is closely related to the
ones proposed in [25], [32]: DLRP also pursues sparse plus
low-rank factorizations of images for fixation prediction, via
appropriate feature transformations. However, the following
critical differences distinguish DLRP significantly from the
existing methods. First of all, different from the unsupervised
MTSP which is purely based on low-level features, our pro-
posed DLRP method effectively utilizes available high-level
supervision information for learning the prediction models.
Thus, DLRP is better at detecting the high-level concepts (e.g.,
human faces) in the images and can predict more accurate
fixation maps (see the examples in Figure 2). Secondly,
the proposed DLRP only learns to apply transformations on
visual representation bases of non-salient backgrounds and
conducts fixation prediction in the original feature space. This
is different from the method in [32], which transforms all the
images into a new feature space, and DLRP is able to utilize
the powerful original visual features for fixation prediction.
Thirdly, DLRP presents strong adaptive ability to new data
sets as it learns transformation for the representation bases
which are intrinsically adaptive to the data. Finally, DLRP does
not require computationally expensive image segmentation as
in [32] and thus is more efficient.
III. PRELIMINARIES
In this work, we perform human fixation prediction at the
image patch level as in [25], [42], which can capture subtle
details within the images and provide more accurate fixation
maps compared with prediction at image level. In performing
the patch level fixation prediction, a common pipeline is: first
regularly partition an image into a number of patches, then
estimate the saliency score of each patch based on its visual
feature, and finally generate the overall fixation map by fusing
the estimated saliency scores of all the patches. It is clear that
correctly estimating the saliency score of each patch is critical
for obtaining an accurate fixation map.
In this section, we briefly introduce how the previous
sparsity-pursuit based methods [25], [42] estimate saliency
scores of image patches and develop our proposed DLRP
method under a similar framework in the following section.
Suppose each image is divided into m patches, each of
which is described by a visual feature x
i
∈ R
d
of dimen-
sionality d, for i = 1, . . . , m. The sparsity-pursuit (SP) based
method detects salient patches based on a prior sparsity, in an
unsupervised way. More specifically, let X = [x
1
, . . . , x
m
]
incorporate all the patch features from an image and SP
factorizes the patch matrix X into a sparse component and
a low-rank one as follows,
min
Z,E
kZk
∗
+ λkEk
2,1
,
s.t. X = XZ + E,
where the component XZ represents the non-salient patches
via a linear combination of themselves, and the coefficient
matrix Z ∈ R
m×m
is low-rank (enforced by the nuclear
norm kZk
∗
) due to the self-similarity in non-salient image
backgrounds. The residue component E ∈ R
d×m
explains
the salient patches which cannot be explained via the self-
representation XZ. SP employs ℓ
2,1
-norm to encourage the
residue E to be column (i.e., patch) sparse, based on the
observations that salient patches are usually fewer than non-
salient ones, and such a column-sparse E directly encodes
the saliency of patches: only the non-zero columns in E
correspond to salient patches.
The above SP method can be extended to employing K dif-
ferent features to seek consistently salient patches for multiple
modalities, and yields the following generalized Multi-Task
Sparsity Pursuit (MTSP) method proposed in [25]:
min
Z
k
,E
k
K
X
k=1
kZ
k
k
∗
+ λkEk
2,1
, (1)
s.t. X
k
= X
k
Z
k
+ E
k
, E = [E
1
; E
2
; . . . ; E
k
],
where X
k
and E
k
denote the patch and residue matrices for
the k-th feature respectively. The superscript k is the index
of different features. The consistency of saliency prediction
across K features is enforced by the column-sparse regular-
ization on E = [E
1
; E
2
; . . . ; E
k
]: a salient patch will result
in non-zero columns in all the K residue matrices.
One can observe that both the SP and MTSP methods
are bottom-up based, which predict fixation maps only based
on low-level features extracted from the image. Despite their
efficiency, such methods may fail to produce reliable results
for images with complex contents. In the next section, we
are going to introduce our proposed DLRP method, which
is developed under the framework similar to SP and MTSP,
剩余11页未读,继续阅读
weixin_38543460
- 粉丝: 5
- 资源: 982
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- C++标准程序库:权威指南
- Java解惑:奇数判断误区与改进方法
- C++编程必读:20种设计模式详解与实战
- LM3S8962微控制器数据手册
- 51单片机C语言实战教程:从入门到精通
- Spring3.0权威指南:JavaEE6实战
- Win32多线程程序设计详解
- Lucene2.9.1开发全攻略:从环境配置到索引创建
- 内存虚拟硬盘技术:提升电脑速度的秘密武器
- Java操作数据库:保存与显示图片到数据库及页面
- ISO14001:2004环境管理体系要求详解
- ShopExV4.8二次开发详解
- 企业形象与产品推广一站式网站建设技术方案揭秘
- Shopex二次开发:触发器与控制器重定向技术详解
- FPGA开发实战指南:创新设计与进阶技巧
- ShopExV4.8二次开发入门:解决升级问题与功能扩展
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功