4252 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 12, DECEMBER 2017
proved to be very effective in practical applications, such as
principal component analysis (PCA) and linear discriminant
analysis (LDA) [25]. Recently, Jiang et al. [26] developed
a subspace method for facial eigenfeature regularization and
extraction (ERE), and the eigenspace of the within-class scatter
matrix is decomposed into three subspaces involving a reliable
subspace spanned mainly by variation, an unstable one due to
noise as well as limited training data, and a null subspace. This
could alleviate the problem of instability, overfitting, and poor
generalization. As we know, PCA is an unsupervised method
and does not require label information. To take into account the
label information, an asymmetric PCA (APCA) [27] approach
was proposed, which utilizes class covariance matrices and
enables removing the unreliable dimensions of principal com-
ponents. While APCA is designed for handling the two-class
problem, supervised PCA (SPCA) [28] deals with the multiple-
class problem. Unlike APCA, SPCA imposes different weights
on the covariance matrices so as to consider class-specific
information of the data set. To summarize, the above meth-
ods are expected to generate considerable subspaces, but they
cannot explicitly yield low-rank subspaces and separate the
error matrix as our approach can, e.g., they cannot recover
the clean component of the corrupted image. In some sense,
this would hinder potentially more widespread applications of
these methods.
It is worth noting that there exist two works related to
ours. One is the supervised regularization-based robust sub-
space (SRRS) method [29], which smoothly integrates sub-
space learning and data recovery in a unified framework to
jointly learn discriminative subspace and LRR from the data.
It differs from our method in several aspects.
1) It adopts the Fisher criterion to capture the discriminant
structure while ours utilizes both the Laplacian regular-
izer and the least squares regularizer under the guidance
of the supervised information.
2) It includes a generalized eigen-decomposition problem
to obtain the projecting subspace while ours gives a
closed-form solution accordingly and avoids solving the
expensive Sylvester equation.
3) Our method can be used for regression tasks directly
while SRRS cannot. The other is robust regression
(RR) [30], which leverages the rank regularizer and the
sparse error term, but it regards the underlying data
structure as a single low-rank subspace that might cause
the inaccurate recovery.
By contrast, we assume the data populates on a mixture
of multiple subspaces to guarantee the correct recovery.
Moreover, the subspace obtained from RR does not have
the desired informative properties, e.g., the locality-preserving
ability, while our method can easily achieve this based on the
adaptive regularizer. Details of our method are elaborated in
the following sections.
III. P
ROBLEM SETTING
In this paper, we define the constrained low-rank learn-
ing problem as follows. Given a collection of data points
{x
1
, x
2
,...,x
n
} and their labels {y
1
, y
2
,...,y
n
} distributed in
k classes, we assume they are samples approximately drawn
from a mixture of several subspaces [2]. The principal goal is
to seek the discriminant lowest rank representation Z as well as
the robust projecting subspace P. More specifically, we denote
the training data by X ∈ R
d×n
with each data point stacked in
a column, and the data matrix can be decomposed into a clean
component
˜
X = AZ and an error component E ∈ R
d×n
, where
A ∈ R
d×m
is treated as the dictionary linearly spanning the
data space while Z ∈ R
m×n
reveals the underlying subspace
structure of the data.
More importantly, we argue that the recovered data can be
mapped onto a low-dimensional data space by the robust pro-
jecting subspace P ∈ R
d×k
(the reduced dimension k is set
to the number of classes), i.e., V = P
T
AZ ∈ R
k×n
. On one
hand, the low-dimensional data representation V is expected
to be closely correlated to the label indicator matrix Y ∈ R
k×n
while it acts as the estimated output given the input data. The
matrix Y takes discrete values for classification and continu-
ous values for regression, respectively, e.g., the entries in each
column of Y are set to 1 if the sample belongs to the corre-
sponding class. On the other hand, it is easy to endow the
low-dimensional representation P
T
X, derived from the orig-
inal data space, with several appealing properties like the
locality-preserving ability, by the constraint matrix L ∈ R
n×n
.
Usually, this matrix should be semi-positive definite to make
the imposed regularizer convex.
By tradition, the lowest rank representation is employed to
construct an affinity matrix for subspace segmentation in unsu-
pervised learning. Here, we mainly use it for recovering the
clean data by AZ, where Z plays a dominant role. Under such
circumstance, both the recovered training data and testing data
could show the robustness to noise or corruptions, and it also
allows to discriminate the samples from different categories.
IV. O
UR METHOD
This section concentrates on elaborating the proposed
method, including the formulation, and the optimization
framework as well as the algorithmic procedures.
A. Formulation
As mentioned earlier, our goal is to jointly seek the dis-
criminant lowest-rank representation Z ∈ R
m×n
and the
robust projecting subspace P ∈ R
d×k
in a supervised manner.
Essentially, we have to minimize rank(Z), which is yet diffi-
cult to solve due to its discrete nature. As a common practice
in low-rank methods [2], [5], we use the nuclear norm as its
convex surrogate. In this paper, the dictionary A is set to X.
Hence, our objective function can be formulated as
min
Z,E,P
Z
∗
+ λE
2,1
+ αTr
P
T
XLX
T
P
+ βV − Y
2
F
s.t. X = XZ +E, V = P
T
XZ, 1
T
n
Z = 1
T
n
(1)
where the nuclear norm ·
∗
is the sum of singular values of
a matrix, the group sparse norm ·
2,1
computes the sum of
absolute values of l
2
-norm on each column vector of a matrix,
e.g.,
j
E
j
2
for E, ·
F
denotes the Frobenius norm of a
matrix, 1 is a column vector with all ones. The parameter
α>0 balances the contribution of the constraint to the objec-
tive, β>0 controls the fitting of the least squares term, and