H. Chang et al. / Digital Signal Processing 95 (2019) 102573 3
2.4. D
2
L
2
R
2
By learning a sub-dictionary for each class separately, enforc-
ing
a low-rank constraint on the sub-dictionary and incorporating
the Fisher criterion into the model, S. Li et al. proposed D
2
L
2
R
2
method, which can be formulated as follows
min
D,Z
C
i
X
i
−DZ
i
2
F
+
X
i
−D
i
Z
i
i
2
F
+
j=1, j=i
D
i
Z
i
j
2
F
+λ
1
Z
1
+λ
2
F (Z) +α
C
i
D
i
∗
(4)
where F (Z) = tr(S
W
(Z)) − tr(S
B
(Z)) + ηZ
2
F
, S
W
(Z) =
C
i
=1
z
k
∈Z
i
(z
k
−
ˆ
z
i
)(z
k
−
ˆ
z
i
)
T
is the within-class scatter matrix
of Z, S
B
(Z) =
C
i
=1
n
i
(
ˆ
z
i
−
ˆ
z)(
ˆ
z
i
−
ˆ
z)
T
is the between-class scatter
matrix of Z.
ˆ
z
i
and
ˆ
z denote the mean sample of Z
i
and Z, n
i
is
the number of samples in ith class.
2.5. SCLRDL
Y. Liu et al. imposed both structure and low-rank restriction on
the coefficient matrix, and proposed SCLRDL method. The objective
function can be written as follows
min
D,Z
i
,E
i
C
i
αZ
∗
+
C
i
βE
i
l
s.t. X
i
=DZ
i
+E
i
,
π
i
(Z
i
) =0
(5)
where E
l
can be ·
2
F
for Gaussian noise and ·
1
for random
corruptions.
π
i
(Z
i
) = 0 makes all rows in Z
i
to be zeros except
the corresponding to the ith category.
2.6. RDLRR
By exploiting the low-rankness of both the data representation
and each occlusion-induced error image simultaneously, G. Gao et
al. proposed RDLRR to decompose the data matrix X into two parts
DZ and E, where Z is a low-rank matrix in vector representation
space, while E contains a series of low-rank noise images in origi-
nal
image space. The objective function is formulated as follows
min
D,Z,W,E
i
Z
∗
+λ
n
i=1
E
i
∗
+αZ −Q
2
F
s.t. X =DZ +E, H = WZ
(6)
where H =[h
1
, h
2
, ..., h] ∈R
C×n
and W represent class label ma-
trix
and classifiers parameters, respectively. h
i
=[0, 0, ..., 1, ...,
0, 0]
T
is the label vector of sample x
i
, where the position of ele-
ment
1 indicates the class of x
i
. Q is the same with that in DSLR.
n is the number of sample data.
3. Structure-constrained discriminative dictionary learning
based on Schatten p-norm for face recognition
In this section, a novel dictionary-learning approach based on
the Schatten p-norm model with a structure constraint and dis-
crimination
constraint is proposed.
3.1. SDDLS
p
In face recognition with occluded data, variations between im-
ages
of the same person due to noise are nearly larger than those
due to the change of identity. To reduce the role of large variations
and enhance the role of small variations, we propose applying the
Schatten p-norm (0 < p < 1) to approximate the nonconvex rank
minimization problem. Compared with the widely used nuclear
norm, the Schatten p-norm (0 < p < 1) treats each singular value
differently, which is beneficial for achieving our goal. According to
the above discussions, we formulate the model of SDDLS
p
as fol-
lows:
min
D,Z,E,W
Z
p
S
p
+βZ
1
+λE
1
+αr(Z) +γ g(Z)
s.t. X =DZ +E
(7)
where X, D and E are data matrix, dictionary and coefficient ma-
trix,
respectively. r(Z) and g(Z) are regularization term on Z.
α, β, λ and γ are tradeoff parameters and p ∈(0, 1).
3.2. Structure-constraint term for coefficients
Given a class-specific dictionary D =[D
1
, ..., D
C
], the ideal co-
efficient
matrix should have a block-diagonal structure. Then, we
construct a regularization term r(Z) =P Z
2
F
on the representa-
tion
Z to hit the point, where is the elementwise multiplication
operator and the element in the ith row and jth column of P is
defined as
P
i, j
=
0, if d
i
and x
j
belong to the same class
1
, otherwise
P =[p
1
, ..., p
n
] is a weighted matrix, where p
i
has the form of
[1, ..., 1, 0, ..., 0, 1, ..., 1]
T
. Suppose that x
i
belongs to class c;
then, all elements in p
i
for D
c
are 0s, whereas all others are 1s.
The
term r(Z) encourages Z
j
i
(i = j) to be small values and Z
i
i
to be large values, which makes
j=i
D
j
Z
j
i
2
F
≈ 0 and X
i
≈ D
i
Z
i
i
.
Note that r(Z) is different from the last term in [14]. In [14], the
minimization of
C
j
=1, j=i
D
i
Z
j
i
2
F
cannot ensure that the values in
Z
i
j
(i = j) are small. In addition, [14]attempts to learn a structured
dictionary by minimizing the rank of each subdictionary, which
reduces the diversity in the subdictionary and weakens the repre-
sentation
ability of the dictionary. The last term in [15] forces Z to
be close to Q, which implies that the representation of the samples
from the same class should be identical. The term may adversely
affect the ability of representations. This drawback is overcome in
our proposed approach by using the term r(Z) since the regulariza-
tion
r(Z) only reacts on Z
j
i
(i = j) and encourages the generation
of a coefficient matrix with a block-diagonal structure.
3.3. Discriminative term for coefficients
To make the dictionary optimal for face recognition, we propose
to incorporate the classification error as a term into the objective
function for dictionary learning. Here, we adopt a simple linear
classifier W. Given a label matrix Y, we construct a classification
error term g(Z) =Y − WZ
2
F
to make the coding coefficient dis-
criminative
via projecting the cth class coding coefficients only to
the cth dimension of the label space.
3.4. Optimization
To make problem (7) separable, we first introduce two auxiliary
variables J and L. Then, problem (7)can be rewritten as
min
D,Z,E,W,J,L
J
p
S
p
+βL
1
+λE
1
+αP Z
2
F
+γ Y − WZ
2
F
s.t. X =DZ +E, Z =J, Z = L
(8)