JIA et al.: SPECTRAL–SPATIAL HYPERSPECTRAL IMAGE CLASSIFICATION USING LHALFLRR AND SR-BASED GRAPH CUTS 2475
II. RELATED WORK
In order to facilitate the following description, some nota-
tions are introduced. Let R ∈ R
X×Y ×L
denote the hyperspec-
tral image cube that contains two spatial dimensions X and
Y (pixels) and one spectral dimension L (wavelength). R
xy
is the reflectance spectral signature at spatial location (x, y),
which provides the ability to identify various materials in the
ground scene. S =
R
uv
∈ R
L×ω
2
, max(|u − x|, |v − y|) ≤
ω
2
is the spatial neighborhood of R
xy
(including R
xy
)using
boxes of predefined spatial window size ω × ω pixels (here
ω is an odd number and greater than or equal to 3, and
ω
2
rounds the elements of
ω
2
to the nearest integers less than or
equal to
ω
2
). Considering supervised hyperspectral classifica-
tion, denote by A
i
=[r
1
, r
2
,...,r
n
i
] ∈ R
L×n
i
the training
set of the ith class, and each column r
j
(j =1, 2,...,n
i
) is
an L-dimensional spectral signature. Suppose that there are K
classes present in the scene, and let A =[A
1
, A
2
,...,A
K
] ∈
R
L×n
be the concatenation of the n training samples from all
classes, where n = n
1
+ n
2
+ ···+ n
K
.
A. Low-Rank Representation
With the increase of the spatial filtering window, it is natural
to expect that more pixels of different classes could be included
in the spatial neighborhood S. Due to the high degree of sim-
ilarity among spectral signatures within the same class, LRR
can be applied to investigate the spatial neighborhood informa-
tion more precisely. That is, for a given spatial adjacency matrix
S ∈ R
L×ω
2
, it can be decomposed into a low-rank matrix D
and an error matrix E, which can be mathematically described
as follows:
min
D,E
rank(D)+λE
s.t. S = D + E (1)
where rank(D) represents the rank of matrix D, E
is a cer-
tain regularization strategy for the error, and λ is a parameter.
Especially, it has been pointed out that a “dictionary” matrix
should be introduced to better handle the multiple subspaces of
the data (the data itself can be directly taken as the “dictionary”
matrix, and the coefficient matrix is denoted as Z) [19], and
E
2,0
=#{i : [E]
:,i
2
=0} is usually adopted to character-
ize the phenomena that the data vectors are mildly or heavily
corrupted [37]; hence, a more reasonable objective function
might be
min
Z,E
rank(Z)+λE
2,0
s.t. S = SZ + E. (2)
However, the above optimization problem is difficult to
solve due to the discrete nature of the rank function. Let
σ
1
,σ
2
,...,σ
k
be the all sorted singular values of Z, and the
2,1
norm is a good relaxation of the
2,0
norm. So, a low-rank
recovery to S can be obtained by solving the following convex
optimization problem:
min
Z,E
Z
∗
+ λE
2,1
s.t. S = SZ + E (3)
where ·
∗
denotes the nuclear norm of a matrix, i.e., the
sum of the singular values of the matrix. After obtaining an
optimal solution (Z
∗
, E
∗
), the data can be recovered using
SZ
∗
(or S − E
∗
). Since rank(SZ
∗
) ≤ rank(Z
∗
), SZ
∗
is also
a low-rank recovery to the original data S.
Unfortunately, although solving (3) can result in low-rank
solution of the considered problems, it needs some strict con-
ditions (i.e., the restricted isometry property, for short, RIP
conditions) and may yield the matrix with much higher rank
than the real one [38], [39].
B. SR-Based Classification (SRC)
For a test sample y ∈ R
L
, it can be assumed that the training
samples from a single class lie on a subspace, and y from a cer-
tain class will approximately lie in the linear span of the training
samples of the same class. As a generalization of nearest neigh-
bor (NN) and Nearest subspace (NS) classifiers, SR searches
for the most compact representation of a test sample using only
a few basic elements in the training set [30]. According to the
theory of compressed sensing [40], the SR of y can be obtained
by optimizing the following optimization function [41]:
(
0
):
α =argminα
0
subject to Aα = y (4)
where ·
0
denotes the vector
0
-norm (the number of nonzero
elements in a vector), which is the best sparsity measure
available. Solving the optimization equation (4) turns out to
be a nonconvex NP-hard problem, and is computationally
intractable [42]. On the other hand, the theory of compressed
sensing has showed that if the solution of α is sparse enough,
the
0
-norm minimization problem can be relaxed and solved
by optimizing the following objective function [41], [43]:
(
1
):
α =argmin
α∈R
n
{y −Aα
2
2
+ τ α
1
} (5)
where ·
1
denotes the
1
-norm, which is the sum of the abso-
lute value of the vector, and τ is a scalar weight that quantifies
the relative importance of minimizing α. Unlike the
0
-norm,
the
1
-norm is convex and can be cast as a linear program. After
that, the test sample y is assigned to class i if the distance
between y and A
i
α
i
(the approximation of y using only the
coefficients associated with the ith class) is the smallest among
all classes, i.e.,
Class(y)= argmin
i∈{1,2,...,K}
y −A
i
α
i
2
2
. (6)
III.
1/2
REGULARIZED LOW-RANK REPRESENTATION
(LHALFLRR)
A.
1/2
Regularization-Based Relaxation of Rank Minimization
Problem
Compared with the rank and nuclear norm measures in
(2) and (3) respectively, it can be noticed that rank(Z)=
ω
2
i=1,σ
i
=0
σ
0
i
= σ
0
is the number of nonzero elements of
singular values in Z, whereas Z
∗
=
ω
2
i=1,σ
i
=0
σ
1
i
= σ
1
is the sum of the singular elements. Because σ
1
is the con-
vex envelope of σ
0
, (2) can be relaxed as equation (3) and
rank(Z) can be replaced by Z
∗
. It is obvious that model (3)