跨标签抑制：高效区分与快速字典学习

199 浏览量更新于2024-07-14 收藏 6MB PDF 举报

本文主要探讨了"跨标签抑制：具有组正则化的判别性和快速词典学习"这一主题，发表在2017年8月的IEEE Transactions on Image Processing杂志上，作者是Xiudong Wang和Yuantao Gu，均为IEEE高级会员。论文的核心焦点在于图像分类领域的研究，目标是通过学习一个紧凑且具有区分性的字典，实现高效的图像分类。在传统的字典学习方法中，字典的每个原子（字典矩阵中的列）通常与特定的类别相关联。然而，作者提出了一种创新的跨标签抑制约束，该约束旨在增加不同类别之间特征表示的差异性。这样做的目的是为了提高分类的准确性，使模型能够更准确地识别和区分不同的图像类别。为了保持原始样本的类别属性，文中引入了组正则化技术。这种正则化策略鼓励同一类别的特征表示相似，从而维护了类别间的固有结构。这种方法避免了常用的一般L0或L1范数编码，既能保证计算效率，又不会牺牲分类的区分度。除了理论模型的构建，论文还提出了两种简单的分类方案，这些方案充分利用了学习到的字典进行分类任务。它们旨在将字典学习的优势最大化，提升分类性能，并在实际应用中展现出良好的效果。实验部分，作者在六个不同的数据集上进行了广泛的测试，包括人脸识别等场景，验证了跨标签抑制和组正则化所带来的优势。通过对比实验结果，可以明显看到跨标签抑制方法在保持高精度的同时，提高了处理速度，这对于大规模图像分类任务具有重要意义。这篇论文提供了一种有效的字典学习策略，它在保持类别特性的同时，增强了特征之间的区分度，为图像分类任务带来了显著的性能提升和计算效率优化。对于那些关注图像分类和字典学习的科研人员来说，这篇论文无疑是一个值得深入研究的重要资源。

WANG AND GU: CROSS-LABEL SUPPRESSION: DISCRIMINATIVE AND FAST DICTIONARY LEARNING 3861

• Finally, two simple classiﬁers are developed to cooperate

with the learnt dictionary for image recognition and they

can often bring out promising results.

The rest of this paper is organized as follows:

In Section II and III, we review the related work and

brieﬂy introduce the preliminary on dictionary learning and

the graph Laplacian, respectively. In Section IV, we describe

our cross-label suppression dictionary learning approach with

the group regularization in details, including the formulation,

optimization, classiﬁers and initialization. Signiﬁcantly,

we conduct extensive experiments to evaluate the proposed

algorithm in Section V and conclude our work in Section VI.

II. R

ELATED WORK

A. Supervised Dictionary Learning

In brief, the supervised dictionary learning algorithms for

pattern recognition can be classiﬁed into three main categories.

The ﬁrst category of developed dictionary learning algo-

rithms learns a universal dictionary for all classes and imposes

discriminative terms in the objective function to improve

classiﬁcation performance, including [10], [11], [16], [19],

[22]–[26]. Speciﬁcally, Fisher criterion for enhancement is

employed in [22], and softmax discriminative term is incor-

porated into the cost function in [10] and [23]. Additionally,

a classiﬁer is introduced for joint learning with the dictionary

during training in [10], [11], [16], [19], [24], and [25], where

hinge loss function [24], [25], logistic loss function [10], and

linear prediction cost [11], [16], [19], are adopted for training

the classiﬁer, respectively. Upon employing a linear classiﬁer

adopted in [11] and [16] additionally proposes the label

consistency constraint in the objective function to leverage

the discriminative power, and achieves impressive results in

multiple recognition tasks such as face recognition, object

categorization, and sports action recognition.

The second strategy for promoting the discriminability

learns kinds of structured dictionaries, including a set of class-

speciﬁc dictionaries [5], [12], [27], [28], one universal dictio-

nary with each atom labeled like training signals [16], and

a set of class-speciﬁc dictionaries combined with a universal

dictionary [14], [15]. Reference [5] introduces the softmax

term among multiple class-speciﬁc dictionaries based on the

K-SVD model [1], and apply them for texture segmentation

and scene analysis. Reference [12] learns a class-speciﬁc

dictionary for each class with sparse coding, and impose

the mutual incoherence among these dictionaries, attaining an

excellent performance for digit and audio classiﬁcation. Upon

on [12], [28] additionally introduces self-dictionary incoher-

ence term for ﬁne-grained image categorization. Furthermore,

inspired by the application of the shared sub-dictionary for

clustering [3], [14] employs a common sub-dictionary shared

by all the classes other than class-speciﬁc dictionaries for

classiﬁcation. This strategy is also used in [27], [28].

[15], and [14] employ a joint strategy of learning a global

dictionary and class-speciﬁc dictionaries at the same time,

expecting both the global dictionary and each class-speciﬁc

dictionary possess a good reconstruction for the corresponding

class samples.

Different from the above two categories of supervised

dictionary learning, the third type of learning a discriminative

dictionary assumes all the samples correspond to another

space with different dimension from the original one, includ-

ing kernel-based methods [17], [29]–[33] and manifold-based

algorithms [17], [33], [34]. Instead of the direct linear con-

struction in the original space, these algorithms ﬁrstly need

to map both signals and atoms into another space and then

conduct linear constructions for signals with the dictionary,

which are often used to address nonlinear problems. In kernel-

based dictionary learning, multiple kernels have been jointly

employed for better results in [32], unlike [29]–[31] with just

one single kernel. Besides, Riemannian manifolds are applied

in [33] and [34] and Grassmann manifolds are employed

in [17].

To make representations discriminative, we employ a struc-

tured dictionary in a more ﬂexible way. Explicitly, we propose

the cross-label suppression to constrain large coefﬁcient

appearance at other label-particular atoms rather than

its closely associated ones. Unlike multiple class-speciﬁc

dictionaries-based approaches such as [5], [12], [14], and [27],

the label constraint don’t fully cut off the collaboration among

atoms with different labels for reconstructing samples during

the learning process. Besides, we don’t need to predeﬁne

discriminative sparse codes to utilize the dictionary structure

like [16]. In [16], owing to all the nonzero coefﬁcients in

the predeﬁned discriminative sparse codes for each class are

identically set to 1, nonzero coefﬁcients of one learnt sparse

code are forced to be equal to some extent, and it isn’t very

convincing.

B. Related Work on the Graph Laplacian

The graph Laplacian as a very ﬂexible tool for repre-

senting and processing signals is applied in many domains,

including dimensionality reduction [35], classiﬁcation and

clustering [36]–[39], and image smoothing [40]. [35] exploits

the geometry structure incorporating neighborhood informa-

tion of the data set and proposes Laplacian eigenmaps for

dimensionality reduction and data representation, which pos-

sess locality-preserving properties. Based on the k (k ∈ N)

largest eigenvectors of a normalized Laplacian, [36] proposes

a classical spectrum-based approach for clustering. In semi-

supervised learning, [37] imposes a smoothness constraint

on the classifying function through the Laplacian of the

intrinsic structure revealed by known labeled and unlabeled

data points, and attain encouraging results for handwritten digit

recognition and text classiﬁcation. Reference [38] presents

graph regularized sparse coding with respect to a unsuper-

vised dictionary for image presentation using the Laplacian

as a smooth operator, and validate its effectiveness on both

classiﬁcation and clustering. Reference [39] introduces two

adaptive Laplacians for dictionary learning and sparse coding,

respectively, and apply them to the single label recognition and

multi-label classiﬁcation. Considering the image intensity dif-

fusion, [40] accomplishes the image smoothing by convolving

original images with the heat kernel governed by the Laplacian

of the graph, which is constructed by pixel lattices.

剩余14页未读，继续阅读

weixin_38709466

粉丝: 5
资源: 969

跨标签抑制：高效区分与快速字典学习

MALSAR, 基于结构正则化的多任务学习.zip

人工智能和机器学习之回归算法：岭回归：正则化技术：L1与L2正则化.docx

matlab深度学习正则化

L1正则化与L2正则化的比较

L1正则化和L2正则化的区别

学习率和正则化怎么调整python

深度学习中什么是正则化？L1正则化和L2正则化有什么区别？

持续学习中的权重正则化

matalb 中的正则化有哪些

该种正则化函数代码是怎样的

最新资源