视觉识别中的子模属性选择

需积分: 10 63 浏览量更新于2024-09-06 收藏 1.03MB PDF 举报

"Submodular Attribute Selection for Visual recognition" 在视觉识别领域，低级特征往往无法充分表征图像或视频中的语义内容以及空间-时间结构。这篇由Jingjing Zheng、Zhuolin Jiang、Rama Chellappa和P. Jonathon Phillips等人共同撰写的论文，探讨了使用次模属性选择来提升视觉识别性能的方法。文章已被接受发表在未来的《IEEE Transactions on Pattern Analysis and Machine Intelligence》期刊上，但尚未进行最终编辑。论文主要关注的是如何在现实世界的视觉识别问题中，利用更高层次的概念（即属性）来描述物体或动作。这些属性分为两类：一类是人为生成的，另一类是数据驱动的，通过从数据中使用词典学习方法提取。人为生成的属性通常基于人类的先验知识，如颜色、形状或纹理等；而数据驱动的属性则是通过算法自动学习得到的，能更贴合数据本身的特性。次模函数在优化问题中具有重要的理论价值，特别是在信息理论和组合优化中。在本文中，作者利用次模属性选择的优势，因为次模函数具有“越早添加元素，边际增益越大”的性质，这使得在有限计算资源下，可以有效地寻找最优子集。在视觉识别任务中，这意味着可以优先选择那些对识别贡献最大的属性，从而提高模型的识别精度和效率。作者提出了一种框架，用于从大量候选属性中选择一个子集，该子集在保持识别性能的同时，具有最小的冗余性和最大的多样性。这一过程可以通过最大化一个次模函数来实现，该函数量化了属性子集对识别性能的改善。同时，他们还考虑了属性之间的相互依赖性，通过考虑上下文信息来进一步优化选择过程。实验部分，作者对比了他们的次模属性选择方法与其他特征选择和属性学习方法，展示了在各种视觉识别任务上，如物体分类和行为识别，其方法的优越性。此外，他们还分析了不同属性类型的影响，证明了结合人为生成和数据驱动属性的混合模型可以达到更好的效果。这篇论文提出了一个新的视觉识别策略，通过次模属性选择来提升模型的表示能力和泛化能力。这种方法不仅在理论上富有创新，而且在实践中也显示出了良好的性能，对于视觉识别领域的研究具有重要启示。

展开

0162-8828 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2016.2636827, IEEE

Transactions on Pattern Analysis and Machine Intelligence

• To remove noisy and redundant attributes, we propose

three attribute selection criteria and formulate the attribute

selection problem as a submodular optimization problem.

A greedy optimization algorithm is presented and its

solution is guaranteed to be at least (1-1/e)-approximation

the optimum.

• We conduct experiments on four public datasets for both

object and action recognition and demonstrate that the

proposed attribute-based representations yield comparable

performance to most existing visual recognition algo-

rithms.

This paper is organized as follows: Section 2 describes the ex-

traction method of both human-labeled and data-driven attributes.

Section 3 introduces the concept of submodularity and presents

the proposed submodular attribute selection approach. Section

4 shows some implementation details, and section 5 provides

experimental results and analysis on four public datasets. Section

6 concludes the paper.

2 HUMAN-LABELED ATTRIBUTE AND DATA-

DRIVEN ATTRIBUTE EXTRACTION

Visual classes can be characterized by a collection of human-

labeled attributes. For example, the action “long-jump” in Olympic

Sports Dataset [29] is associated with either motion attributes

(jump forward, motion in the air), or with scene attributes (e.g.,

outdoor, track). Given an instance x ∈ R

, an attribute classiﬁer

: x → {0, 1} predicts the conﬁdence score of the presence of

an attribute a in the image or video. This classiﬁer f

is learned

using the training samples of all classes which have this attribute

as positive and the rest as negative. Given a set of P attribute

classiﬁers {f

(x)}

p=1

, an instance x is mapped to the semantic

space O:

h(x) : R

→ O = [0, 1]

(1)

where h(x) = [h

(x), ..., h

(x))]

is a P -dimensional attribute

score vector.

For the extraction of data-driven attributes, we propose to

discover a large set of data-driven attributes using a dictionary

learning method. The low-level features of N training samples

are denoted as X = [x

, x

, ..., x

] ∈ R

d×N

. Assume that we

have K classes, and the features of training samples can also

be expressed as X = [X

, X

, ..., X

], where X

∈ R

d×N

denotes N

samples form class k. For each class k, we ﬁrst

learn a class-speciﬁc dictionary D

of size M

using the KSVD

algorithm [2]. We then initialize a total dictionary D using these

class-speciﬁc dictionaries as D = [D

, D

, ..., D

], and learn

it by minimizing the reconstruction error of all the training

samples. The class-speciﬁc dictionary D

is learned by solving

the following problem:

arg min

||X

− D

s.t. ∀i, ||z

≤ T (2)

where D

= [d

, ..., d

] ∈ R

d×M

, Z

= [z

, ..., z

] ∈

×N

are the sparse codes of X

. The sparsity constraint

||z

≤ T speciﬁes that the sample x

has fewer than T

dictionary atoms from D

in its decomposition. After we have

obtained the class-speciﬁc dictionaries, we concatenate them to

initialize a total dictionary D = [D

, D

, ..., D

] of size M,

where M =

k=1

. The dictionary D can also be denoted

as D = [d

, d

, ..., d

] ∈ R

d×N

and is learned by solving the

following problem:

arg min

D,Z

||X − DZ||

s.t. ∀i, ||z

≤ T (3)

where D is the learned attribute dictionary of size M, Z =

, ..., z

] ∈ R

M×N

are the sparse codes of X. The sparsity

constraint ||z

≤ T speciﬁes that the sample x

has fewer than

T dictionary atoms from D in its decomposition. It means that

each vector z

is sparse and has fewer than T non-zero entries.

The value of the j-th entry z

in the coefﬁcients z

indicates

whether the dictionary atom d

is used for the decomposition of

sample x

. Thus, each dictionary atom d

is treated as a data-

driven attribute, and z

is treated as an M -dimensional attribute

score vector.

3 SUBMODULAR ATTRIBUTE SELECTION

Since we formulate the attribute selection problem as a submod-

ular optimization problem, we ﬁrst brieﬂy review the deﬁnition

and optimization of submodular functions. Then we propose

three attribute selection criteria for selecting a discriminative and

compact subset of attributes. After that, we formulate the attribute

selection problem as an optimization problem of a submdular

function, which is a linear combination of the entropy rate of a

random walk and a weighted maximum coverage function.

3.1 Submodularity

Submodular functions are a class of set functions that have

the property of diminishing returns [28]. Given a set E, a set

function F : 2

→ R is submodular if F (A ∪ v) − F (A) ≥

f(B ∪ v) − F (B) holds for all A ⊆ B ⊆ E and v ∈ E \ B. The

diminishing return property means that the marginal gain of the

element v decreases if used in a later stage. Recently, submodular

functions have been widely exploited in various applications, such

as sensor placements [17], superpixel segmentation [26], docu-

ment summarization [24], object detection and recognition [15],

[51] and feature selection [8], [27]. [27] presented a submodular

feature selection method for acoustic score spaces based on ex-

isting facility location and saturated coverage functions. Different

from these applications, we deﬁne a novel submodular objective

function for attribute selection. Although we only evaluate our

approach for object and action recognition, it can be applied to

other recognition tasks that use attributes.

3.2 Attribute Selection criterion

Motivation: We ﬁrst present an example to illustrate the motiva-

tion of the proposed attribute selection criterion. Assume that we

have four classes denoted as c

, k = 1, ..., 4 and three attributes

denoted as a

, m = 1, ..., 3. For each attribute, the class c

may have or not have this attribute. An example of attribute

assignment for each class is shown in Table 1. Here we consider

the discrimination capability of the attribute a

for distinguishing

class c

from class c

. It can be seen that if either one class has

attribute a

, then a

is discriminative for distinguishing class c

from c

. However, if both classes have or do not have this attribute,

then a

is not useful for distinguishing the two classes. In this

way, we evaluate the discrimination capability of each attribute

for distinguishing each pairwise classes in Table 2. Let us further

consider the task of selecting two attributes out of three attributes.

下载后可阅读完整内容，剩余13页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

大白菜丫丫

粉丝: 73

视觉识别中的子模属性选择

Deep Imbalanced Attribute Classification using Visual Attention Aggregation

Attribute Aware Pooling for Pedestrian Attribute Recognition.pdf

Medical Entity Recognition and Attribute Extraction for Chinese EMR Dataset-数据集

Recurrent attention model for pedestrian attribute recognition.pdf

Attribute Extension for MediaWiki-开源

Attribute Weighting for Averaged One-Dependence Estimators

PowerCommands for Visual Studio 2008

Relation-Aware Pedestrian Attribute Recognition with.pdf

Pedestrian Attribute Recognition with Graph Convolutional Network in Surveillanc

Attribute Reduction for Heterogeneous Data Based on the Combination of Classical and Fuzzy Rough Set Models

最新资源