where H is the label set with binary entries (0 or 1), α and β are
positive scalars.
12
, ,...,
KN
N
Q q q q
are “discriminative”
sparse codes for
, where
is an ideal
“discriminative” sparse code for x
i
if the nonzero values of q
i
occur at those indices where x
i
and atom d
i
share the same label
[9]. Suppose that
has 3 classes, where
are
from class 1,
are from class 2, and the rest from class 3,
the “discriminative” sparse codes matrix Q is defined as
1 1 1 1 0 0 0 0 0
1 1 1 1 0 0 0 0 0
0 0 0 0 1 1 0 0 0
0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 1 1 1
Q=
. (3)
Thus, term
is the discriminative sparse-code error,
where
converts the original sparse codes in S to be
more discriminative in a feature subspace
.
can
encourage the label consistency in the resulting codes, but
is
arbitrarily defined, so it cannot preserve local information and
inherit the structure information of samples. Setting all nonzero
entries to ones is also too hard, since the coefficients in S are
essentially soft, i.e., a large value s
i,j
means that the contribution
of each
to reconstruct
is large, and small otherwise.
Note that D-KSVD and LC-KSVD can be equivalent with
the uniform atom allocation [36], which is also based on the
identical initialization conditions and optimization methods. By
the equivalence, we can reformulate LC-KSVD as
2
* * *
,,
0
0
, , argmin
. . , 1,2,...,
D S W
F
i
XD
D S W S
HW
s t s T i N
, (4)
which is just the problem of D-KSVD, where
is the number
of dictionary atoms allocated per class. The analysis in [36] also
shows that D-KSVD is preferable because of its simplicity and
efficiency, compared to the LC-KSVD algorithm.
B. Review of LCLE-DL
LCLE-DL is another one related DL method, so we also briefly
revisit it. LCLE-DL calculates a discriminative dictionary D by
combining the label embedding of atoms and locality constraint
of atoms jointly. The problem of LCLE-DL is defined as
22
22
, , ,
22
2
min
, . . 1, 1, ,
TT
D S V L
i
X DS tr S LS X DV tr V UV
S V st d i K
, (5)
where S and V denote the coefficient matrices,
and
denote the reconstruction error, and
is the
regularization used to transfer the label constraint
to/from the locality constraint
. U is the scaled label
matrix constructed by the labels of dictionary.
and
are
parameters. L is a graph Laplacian matrix defined as
1,
1
, , ,
K
K i i j
j
L G M where G diag g g g M
, (6)
where the nearest neighbor graph is weighted by M as
2
,
exp /
0
i j j i
ij
d d if d kNN d
M
else
, (7)
which is defined by the Gaussian function, where
is the kernel
width,
is the k-nearest neighbor set of atom
, G is a
diagonal matrix and
encodes the similarity between atoms
and
. By the above definitions,
can encode the reconstruction error with the locality constraint,
where
inherits the manifold structure of training set.
encodes the reconstruction error with
label embedding, where
forces the intra-class atoms
in D to have similar profiles. Term
is a regularization
over the coefficients, which ensures the mutual transformation
between the label embedding and locality constraint.
III. ROBUST FLEXIBLE DISCRIMINATIVE DICTIONARY
LEARNING (RFDDL)
A. Problem Formulation
The presented RFDDL model improves the robust properties of
discriminative DL to noise and sparse errors in twofold. First, it
calculates the robust discriminative dictionary and codes in the
recovered clean data and atom subspaces. Specifically, RFDDL
decomposes the original X and dictionary D in each iteration to
recover underlying clean data
and clean dictionary
,
and models the errors
and
at the same time in terms of
and
, where L
2,1
-norm is used on
and
so that the sparse errors in data and atoms can be
corrected jointly. Then, RFDDL performs the discriminant DL
over clean
and
for accurate representations. Second,
our RFDDL encodes the locality and defines a Laplacian matrix
by computing discrimination-promoting reconstruction weights
over recovered clean atoms, which can encourage intra-class
samples to have similar sparse codes and inter-class samples to
have different ones, and will be detailed in next subsection. By
combing the joint subspace recovery and Laplacian regularized
reconstruction error, discriminative sparse-code error and data
classification error, the initial problem of RFDDL is given as
2
T
2,1 2,1
, , , , ,
2 2 2
2,1
min
+
. . ,
D
new new D
F
D S L E E W
e
F F F
new new D
X D LS E E
Q LS SH H WLS W
s t X X E D D E
, (8)
where
is the L
2,1
-norm based error,
and
are the parameters. Note that the L
2,1
-norm can ensure the
regularized matrix to be sparse in rows, and the L
2,1
-norm based
metric is robust to noise and outliners in data and atoms [2], [7],
[27]. L
2,1
-norm based classifier
can force the columns of
W to be sparse so that the discriminative soft labels can be
predicted in the latent sparse subspace. Q and H are similarly
defined as LC-KSVD, and
is the Frobenius-norm based
coefficients,
is the “centering matrix”, that is,
can be considered as the normalized coding
coefficients. It is clear that the discriminative Laplacian matrix
is associated with the learning of codes and classifier, which
can potentially obtain the more accurate codes, discriminative
dictionary and powerful discriminative classifier jointly.
Please note that the linear reconstruction
may be overfitted especially when the number of training data
is limited. To enable the data sampled from nonlinear manifold
to be handled potentially by DL and avoid the overfitting, our
RFDDL proposes a flexible reconstruction residue motivated