variant of NMF), which requires the obtained basis vectors to be
close to the original data, thereby guaranteeing the real sparse
representation of the data. Their method solves the problem that
for the ℓ
1
and ℓ
1=2
constraints, the new representations of some
data samples are still highly dense, although, on the average, the
new representation is very sparse. In addition, label constraints
[16] are added to the NMF framework to introduce the label
information from the training data sample. From the aspect of the
cost functions that quantify the quality of the approximation, the
Earth Mover's distance error between the data and the matrix
product [17] is proposed to more effectively quantify the error in
image or histogram matching. New tools, such as multiple kernel
learning, are also applied to NMF [18], which leads to better
performance.
Among the regularizations, the manifold regularization, which
considers the geometry of data space, has been recently incorpo-
rated into the original NMF framework resulting in a novel low-
rank matrix factorization [19], named GNMF, where a p-nearest
neighbor simple graph is constructed to encode the geometrical
information. This modified NMF framework outperforms the
original NMF method. The minimization problem is formulated as
min
B;F
jjX BFjj
2
F
þαTrðFLF
T
Þ
s:t: BZ 0; FZ 0: ð5Þ
An alternate iterati ve algorithm, with the following updating rules,
B
ik
’B
ik
ððXF
T
Þ
ik
=ðBFF
T
Þ
ik
Þ, F
kj
’F
kj
ððB
T
Xþ αFSÞ
kj
=ðB
T
BFþ αFDÞ
kj
Þ,is
applied in each iteration to solve the optimization problem (5).
Howev er, the performance of Eq. (5) is sensitiv e to the construction
of the graph. The performance differs for various graphs with different
weighting schemes and parameter selections. Due to the infinite
number of graphs embodied within the intrinsic manifold, an exhaus-
tive search for the optimal graph for a special task (such as matrix
factorization) is very time consuming, expensive, difficult, ev en
impossible to se lect by hand, and easily prone to o verfit the training
set. T o address this problem, a multiple graph regularized NMF [12 ] is
proposed, where graph selection and negative matrix factorization are
jointly completed. T ests on the datasets demonstrat e that it outper-
forms the GNMF method.
As for the TSVD framework, many improvements have been
applied to face recognition [20– 22], document classification [23],
etc. However, the improvements made to the TSVD framework are
meager compared with those made to the NMF on the data
analysis field. Recently, Zhang and Zhao [13] proposed a novel,
low-rank matrix factorization method (named MMF) that differs
from the GNMF method in that their graph regularization is
incorporated into the original TSVD framework. Their optimization
problem is formulated as
min
B;F
jjX BFjj
2
F
þαTrðFΦF
T
Þ
s:t: B
T
B ¼ I
r
; ð6Þ
where Φ is a symmetric and positive definite matrix which
characterizes the data manifold. The most widely used matrix is
the graph Laplacian matrix. A direct algorithm and alternate
iterative algorithm are proposed for solving this minimization
problem. Testing demonstrates that the MMF method is superior
to GNMF, which adds the manifold regularization term to the
original NMF framework.
2.2. Hypergraph learning
Different from the simple graph, where the edge connects two
vertices, the hyperedge in the Hypergraph connects more than
two vertices and indicates a high-order relationship among data
samples. Suppose that Hypergraph G ¼ðV; E; WÞ is composed of
the vertex set V, the hyperedge set E, where each hyperedge e is a
subset of the vertex set V. Denote the weight corresponding to the
hyperedge e,aswðeÞ. If we use the incidence matrix H, to denote
Hypergraph G, then Hðv; eÞ¼1ifvA e, and Hðv; eÞ¼0, otherwise.
Based on matrix H, the vertex degree of each vertex is defined
as dðvÞ¼Σ
e A E
wðeÞHðv; eÞ. The edge degree of a hyperedge e is
defined as δðeÞ¼∑
e A E
Hðv; eÞ. The weight matrix W, where
W
ij
¼ ∑
e A E
∑
ði;jÞ A e
ðwðeÞ=δðeÞÞ is the weight between any two ver-
tices in the Hypergraph. If we use D
v
, D
e
and W
e
to denote
diagonal matrices of vertex degrees, hyperedge degrees, and the
edge weight, respectively, then an un-normalized Hypergraph
Laplacian matrix is formulated as L
Hyper
¼ D
v
S, where S ¼
HW
e
D
1
e
H
T
.
Hypergraph learning has already been applied to many tasks
such as classification, clustering, retrieval and embedding. For
instance, Agarwal et al. [24] used Hypergraph for clustering by
using a clique average to transform a Hypergraph into a simple
graph. Zass and Shashua [25] adopted the Hypergraph for image
matching by using convex optimization. Hypergraph has been
applied to the problems of multi-label learning [26] and video
segmentation [27]. Yu et al. [28] proposed a novel Hypergraph-
based semi-supervised learning method for image classification, in
which the weights of hyperedges were adaptively coordinated.
Huang et al. [29] formulated the image clustering task as a
problem of Hypergraph partition. In [30], a Hypergraph-based
image retrieval approach is proposed. This approach builds a
Hypergraph by generating a hyperedge from each sample and its
neighbors; Hypergraph-based ranking is then performed. Gao
et al. [31] integrated Hypergraph learning into the framework of
sparse coding. The Hypergraph manifold enhanced the geometric
structure of the sparse codes. The aim of this paper is to integrate
the Hypergraph manifold into low-rank factorization.
3. The proposed method
3.1. Hypergraph regularization term
Local invariance assumption considers that close data samples
are similarly embedded. That is, if, in the intrinsic manifold of data
distribution, the feature vectors of data samples are close, then the
corresponding coding vectors with respect to the new base are
also close and vice versa. Due to the high order information caught
by the Hypergraph, the regularization term to measure the
smoothness of low-dimensional coding vectors in F can be
formulated as
O
HGr
ðFÞ¼
1
2
W
ij
F
i
F
j
j
2
¼
1
2
∑
e A E
∑
ði;jÞ A e
wðeÞ
δðeÞ
F
i
F
j
j
2
¼ TrðFL
Hyper
FÞ; ð7Þ
where F ¼½F
i
and F
i
is the ith coding vector, W
ij
¼ ∑
e A E;
∑
ði;jÞ A e
ðwðeÞ=δðeÞÞ is the weight between two vertices in Hypergraph.
By minimizing (7), we obtain the desired coding vectors
corresponding to the data points within the same hyperedge;
the coding vectors are close to each other if the data points are
close and, consequently, the locality information among these data
points within the same hyperedge is preserved. We emphasize
here that Hypergraph regularization is different from the graph
regularization framework [12] in that the graph regularization
framework guarantees only the smoothness of low-dimensional
data representation by pair-wise relationships between two data
samples.
T. Jin et al. / Pattern Recognition 48 (2015) 1011–1022 1013