WANG AND GU: CROSS-LABEL SUPPRESSION: DISCRIMINATIVE AND FAST DICTIONARY LEARNING 3861
• Finally, two simple classifiers are developed to cooperate
with the learnt dictionary for image recognition and they
can often bring out promising results.
The rest of this paper is organized as follows:
In Section II and III, we review the related work and
briefly introduce the preliminary on dictionary learning and
the graph Laplacian, respectively. In Section IV, we describe
our cross-label suppression dictionary learning approach with
the group regularization in details, including the formulation,
optimization, classifiers and initialization. Significantly,
we conduct extensive experiments to evaluate the proposed
algorithm in Section V and conclude our work in Section VI.
II. R
ELATED WORK
A. Supervised Dictionary Learning
In brief, the supervised dictionary learning algorithms for
pattern recognition can be classified into three main categories.
The first category of developed dictionary learning algo-
rithms learns a universal dictionary for all classes and imposes
discriminative terms in the objective function to improve
classification performance, including [10], [11], [16], [19],
[22]–[26]. Specifically, Fisher criterion for enhancement is
employed in [22], and softmax discriminative term is incor-
porated into the cost function in [10] and [23]. Additionally,
a classifier is introduced for joint learning with the dictionary
during training in [10], [11], [16], [19], [24], and [25], where
hinge loss function [24], [25], logistic loss function [10], and
linear prediction cost [11], [16], [19], are adopted for training
the classifier, respectively. Upon employing a linear classifier
adopted in [11] and [16] additionally proposes the label
consistency constraint in the objective function to leverage
the discriminative power, and achieves impressive results in
multiple recognition tasks such as face recognition, object
categorization, and sports action recognition.
The second strategy for promoting the discriminability
learns kinds of structured dictionaries, including a set of class-
specific dictionaries [5], [12], [27], [28], one universal dictio-
nary with each atom labeled like training signals [16], and
a set of class-specific dictionaries combined with a universal
dictionary [14], [15]. Reference [5] introduces the softmax
term among multiple class-specific dictionaries based on the
K-SVD model [1], and apply them for texture segmentation
and scene analysis. Reference [12] learns a class-specific
dictionary for each class with sparse coding, and impose
the mutual incoherence among these dictionaries, attaining an
excellent performance for digit and audio classification. Upon
on [12], [28] additionally introduces self-dictionary incoher-
ence term for fine-grained image categorization. Furthermore,
inspired by the application of the shared sub-dictionary for
clustering [3], [14] employs a common sub-dictionary shared
by all the classes other than class-specific dictionaries for
classification. This strategy is also used in [27], [28].
[15], and [14] employ a joint strategy of learning a global
dictionary and class-specific dictionaries at the same time,
expecting both the global dictionary and each class-specific
dictionary possess a good reconstruction for the corresponding
class samples.
Different from the above two categories of supervised
dictionary learning, the third type of learning a discriminative
dictionary assumes all the samples correspond to another
space with different dimension from the original one, includ-
ing kernel-based methods [17], [29]–[33] and manifold-based
algorithms [17], [33], [34]. Instead of the direct linear con-
struction in the original space, these algorithms firstly need
to map both signals and atoms into another space and then
conduct linear constructions for signals with the dictionary,
which are often used to address nonlinear problems. In kernel-
based dictionary learning, multiple kernels have been jointly
employed for better results in [32], unlike [29]–[31] with just
one single kernel. Besides, Riemannian manifolds are applied
in [33] and [34] and Grassmann manifolds are employed
in [17].
To make representations discriminative, we employ a struc-
tured dictionary in a more flexible way. Explicitly, we propose
the cross-label suppression to constrain large coefficient
appearance at other label-particular atoms rather than
its closely associated ones. Unlike multiple class-specific
dictionaries-based approaches such as [5], [12], [14], and [27],
the label constraint don’t fully cut off the collaboration among
atoms with different labels for reconstructing samples during
the learning process. Besides, we don’t need to predefine
discriminative sparse codes to utilize the dictionary structure
like [16]. In [16], owing to all the nonzero coefficients in
the predefined discriminative sparse codes for each class are
identically set to 1, nonzero coefficients of one learnt sparse
code are forced to be equal to some extent, and it isn’t very
convincing.
B. Related Work on the Graph Laplacian
The graph Laplacian as a very flexible tool for repre-
senting and processing signals is applied in many domains,
including dimensionality reduction [35], classification and
clustering [36]–[39], and image smoothing [40]. [35] exploits
the geometry structure incorporating neighborhood informa-
tion of the data set and proposes Laplacian eigenmaps for
dimensionality reduction and data representation, which pos-
sess locality-preserving properties. Based on the k (k ∈ N)
largest eigenvectors of a normalized Laplacian, [36] proposes
a classical spectrum-based approach for clustering. In semi-
supervised learning, [37] imposes a smoothness constraint
on the classifying function through the Laplacian of the
intrinsic structure revealed by known labeled and unlabeled
data points, and attain encouraging results for handwritten digit
recognition and text classification. Reference [38] presents
graph regularized sparse coding with respect to a unsuper-
vised dictionary for image presentation using the Laplacian
as a smooth operator, and validate its effectiveness on both
classification and clustering. Reference [39] introduces two
adaptive Laplacians for dictionary learning and sparse coding,
respectively, and apply them to the single label recognition and
multi-label classification. Considering the image intensity dif-
fusion, [40] accomplishes the image smoothing by convolving
original images with the heat kernel governed by the Laplacian
of the graph, which is constructed by pixel lattices.