双低秩追求：深度学习显著特征的 saliency detection

研究论文

145 浏览量更新于2024-08-26 收藏 1.47MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. X, NO. XX, MONTH YEAR 3

(a) Input Image (b) Ground Truth (c) SVM (d) MTSP (e) Our result

Fig. 2. Examples of predicted ﬁxation maps, for an image shown in (a),

from different methods: (c) SVM based [19], (d) MTSP [25] and (e) our

proposed DLRP. DLRP predicts a more accurate ﬁxation map. The ground

truth is provided in (b) for reference.

instance, a multi-task sparsity pursuit based method – termed

as MTSP – was proposed in [25], which combines multiple

features for ﬁxation prediction. That work also suggests that

one non-salient background region can be linearly represented

by other background regions due to the underlying low-

rank structure. Further a generalized MTSP (GMTSP) was

described in [25] to incorporate the top-down priors in

the supervised environment. Although GMTSP also considers

some kind of high-level information, the bottom-up and high-

level components are treated separately. Recent methods also

consider the sparsity prior in the salient object detection task.

Peng et al. [42] proposed to extract sparse components from

an image as the detected salient regions. With a similar spirit,

Shen et al. [32] applied a robust principal component analysis

(RPCA) method [43] to factorize a segmented image into few

salient regions plus non-salient background of low-rankness.

In addition, Shen et al. also proposed to learn a new feature

space, which is more favorable than original feature space

for saliency detection via RPCA, under the supervision of

provided saliency map annotations.

Our proposed DLRP method is closely related to the

ones proposed in [25], [32]: DLRP also pursues sparse plus

low-rank factorizations of images for ﬁxation prediction, via

appropriate feature transformations. However, the following

critical differences distinguish DLRP signiﬁcantly from the

existing methods. First of all, different from the unsupervised

MTSP which is purely based on low-level features, our pro-

posed DLRP method effectively utilizes available high-level

supervision information for learning the prediction models.

Thus, DLRP is better at detecting the high-level concepts (e.g.,

human faces) in the images and can predict more accurate

ﬁxation maps (see the examples in Figure 2). Secondly,

the proposed DLRP only learns to apply transformations on

visual representation bases of non-salient backgrounds and

conducts ﬁxation prediction in the original feature space. This

is different from the method in [32], which transforms all the

images into a new feature space, and DLRP is able to utilize

the powerful original visual features for ﬁxation prediction.

Thirdly, DLRP presents strong adaptive ability to new data

sets as it learns transformation for the representation bases

which are intrinsically adaptive to the data. Finally, DLRP does

not require computationally expensive image segmentation as

in [32] and thus is more efﬁcient.

III. PRELIMINARIES

In this work, we perform human ﬁxation prediction at the

image patch level as in [25], [42], which can capture subtle

details within the images and provide more accurate ﬁxation

maps compared with prediction at image level. In performing

the patch level ﬁxation prediction, a common pipeline is: ﬁrst

regularly partition an image into a number of patches, then

estimate the saliency score of each patch based on its visual

feature, and ﬁnally generate the overall ﬁxation map by fusing

the estimated saliency scores of all the patches. It is clear that

correctly estimating the saliency score of each patch is critical

for obtaining an accurate ﬁxation map.

In this section, we brieﬂy introduce how the previous

sparsity-pursuit based methods [25], [42] estimate saliency

scores of image patches and develop our proposed DLRP

method under a similar framework in the following section.

Suppose each image is divided into m patches, each of

which is described by a visual feature x

∈ R

of dimen-

sionality d, for i = 1, . . . , m. The sparsity-pursuit (SP) based

method detects salient patches based on a prior sparsity, in an

unsupervised way. More speciﬁcally, let X = [x

, . . . , x

]

incorporate all the patch features from an image and SP

factorizes the patch matrix X into a sparse component and

a low-rank one as follows,

min

Z,E

kZk

∗

+ λkEk

2,1

s.t. X = XZ + E,

where the component XZ represents the non-salient patches

via a linear combination of themselves, and the coefﬁcient

matrix Z ∈ R

m×m

is low-rank (enforced by the nuclear

norm kZk

∗

) due to the self-similarity in non-salient image

backgrounds. The residue component E ∈ R

d×m

explains

the salient patches which cannot be explained via the self-

representation XZ. SP employs ℓ

2,1

-norm to encourage the

residue E to be column (i.e., patch) sparse, based on the

observations that salient patches are usually fewer than non-

salient ones, and such a column-sparse E directly encodes

the saliency of patches: only the non-zero columns in E

correspond to salient patches.

The above SP method can be extended to employing K dif-

ferent features to seek consistently salient patches for multiple

modalities, and yields the following generalized Multi-Task

Sparsity Pursuit (MTSP) method proposed in [25]:

min

k=1

∗

+ λkEk

2,1

, (1)

s.t. X

= X

+ E

, E = [E

; E

; . . . ; E

where X

and E

denote the patch and residue matrices for

the k-th feature respectively. The superscript k is the index

of different features. The consistency of saliency prediction

across K features is enforced by the column-sparse regular-

ization on E = [E

; E

; . . . ; E

]: a salient patch will result

in non-zero columns in all the K residue matrices.

One can observe that both the SP and MTSP methods

are bottom-up based, which predict ﬁxation maps only based

on low-level features extracted from the image. Despite their

efﬁciency, such methods may fail to produce reliable results

for images with complex contents. In the next section, we

are going to introduce our proposed DLRP method, which

is developed under the framework similar to SP and MTSP,

剩余11页未读，继续阅读

weixin_38543460

粉丝: 5
资源: 982

双低秩追求：深度学习显著特征的 saliency detection

双重低排名追求：学习显着性检测的显着特征

双重路由深层胶囊网络的入侵检测系统.docx

qcustomplot 显示 双重 坐标轴

双重机器学习双重差分

双重机器学习因果推断

R语言 双重机器学习

多头双重注意力的小目标检测算法原理流程图

计算1！+2！+3！+4!+...+9!,n!表示n的阶乘，如4！=4x3x2x1。双重循环写法如下：

怎么建立双重机器学习模型

供应链数字化与企业绿色创新建立双重机器学习模型

铁路系统控显双机是什么

懒汉模式多线程中双重检测的作用

双重注意力机制（CBAM）

关闭appleid双重认证

双重独立编码结构的优缺点

DDML双重机器学习方法用于政策评估的Stata代码

双重机器学习的python代码

深度Q学习网络怎么改进

利用双重循环打印如下图形： 让计算机显示如下图形，用go语言编写程序 * * * * * * * *

微信小程序双重wx:for

最新资源

qcustomplot 显示双重坐标轴

R语言双重机器学习

利用双重循环打印如下图形：让计算机显示如下图形，用go语言编写程序 * * * * * * * *