Mathematical Problems in Engineering 3
feature samples manifold structure are also considered.
Although the graph-based NMF algorithm incorporates the
prior knowledge of manifold structure into NMF framework,
overemphasizing local structure preservation sometimes will
degrade its performance. A structure preserving nonnega-
tive matrix factorization (SPNMF) is proposed in [10], in
which the intrinsic manifold is approximated with local
and distant graphs. Experiment results demonstrate SPNMF
outperforms GNMF and its variants.
All the abovementioned NMF variants are based on the
premisethattheinputdataarenoise-free;itisimpractical
to real-world data samples which are oen corrupted by
noise. For boosting NMF robustness to outliers,
2,1
-norm
based NMF reconstruction function is applied in [41, 42]. In
[43],
2,1
-norm and
1
-norm are used to formulate the graph
embedding and data reconstruction function, which endows
the algorithm with robustness to unreliable graphs and noisy
labels.
Among the existing methods, SPNMF is closely relevant
to our method and performs better on some specic datasets.
ere are still some dierences between our proposed algo-
rithm and SPNMF. Firstly, the Laplacian matrix of distant
repulsion graph from SPNMF needs to be updated in each
iteration, which is time consuming. Additionally, to learn a
better parts-based representation, the basis vector of SPNMF
is required to be as orthogonal as possible. Finally, SPNMF is
sensitive to noisy data points.
3. The Proposed Method
Our goal is to perform matrix factorization on the hidden
semanticsubspace.Tothisend,weimposethelocalanity
and distant repulsion structure preservation on the new
robust NMF framework. In this section, we will rst explain
the structure preservation and robust data reconstruction
terms of RSPNMF, followed by its optimization algorithm.
e convergence of the proposed algorithm is proved nally.
3.1. Problem Formulation. Despite the diverse motivations of
various dimensionality reduction algorithms, most of them
can be explained within a graph embedding framework.
Let ={,}be an undirected weighted graph, where
vertex set corresponds to a dataset and ∈R
𝑁×𝑁
+
is
an anity matrix whose elements measure the similarity of
each pair of vertices. In graph embedding algorithm, graph
characterizes the prior knowledge of geometric structure
from data distribution.
If we use the
1
-nearest neighboring graph to character-
ize the local structure of data distribution, the weight matrix
=[
𝑖𝑗
]
𝑁×𝑁
canbedenedasfollows:
𝑖𝑗
=
−‖𝑥
𝑖
−𝑥
𝑗
‖
2
/𝜎
2
, if
𝑖
∈
𝑘
1
𝑗
or
𝑗
∈
𝑘
1
𝑖
,
0, otherwise,
(4)
where is the bandwidth parameter and
𝑘
1
(
𝑖
)denotes the
set of
1
nearest neighbors of
𝑖
.
Local invariance assumption which encourages neigh-
boring data pairs in the original space to be still close in the
low-dimensional embedding subspace can be formulated as
min
𝑉
1
2
𝑁
𝑖,𝑗=1
V
𝑖
−V
𝑗
2
𝑖𝑗
,
(5)
where V
𝑖
is the corresponding low-dimensional representa-
tion of
𝑖
.
By dening the diagonal matrix and Laplacian matrix
of graph are as
=−,
𝑖𝑖
=
𝑖 =𝑗
𝑖𝑗
, ∀,
(6)
we can rewrite (5) as
min
𝑉
1
2
𝑁
𝑖,𝑗=1
V
𝑖
−V
𝑗
2
𝑖𝑗
=min
𝑉
𝑁
𝑖=1
V
𝑇
𝑖
V
𝑖
𝑖𝑖
−
𝑁
𝑖,𝑗=1
V
𝑇
𝑖
V
𝑗
𝑖𝑗
=min
𝑉
Tr
𝑇
−Tr
𝑇
=min
𝑉
Tr
𝑇
,
(7)
where Tr()denotes the trace of a matrix.
Equation (7) essentially holds the smoothness of
dimensionality reduction process. It is based on two vital
assumptions; rstly, the neighboring data samples in high-
dimensional space are semantic similarity, and secondly
the preservation of anity structure plays great role for
low-dimensional representation.
Local invariance assumption essentially exploits the
favorite relationship among similar data samples under unsu-
pervised condition; however, it ignores unfavorite relation-
ship between divergent data pairs. In this paper, we conjecture
that the distant data pairs are always semantic dierence. A
new distant neighboring graph
𝐶
={,
𝐶
}which is used
to describe the repulsion relationship between dissimilar data
pairs is also constructed. e corresponding weight matrix
𝐶
=[
𝑐
𝑖𝑗
]
𝑁×𝑁
is dened as follows:
𝑐
𝑖𝑗
=
−‖𝑥
𝑖
−𝑥
𝑗
‖
2
/𝜎
2
, if
𝑖
∈
𝑘
2
𝑗
or
𝑗
∈
𝑘
2
𝑖
,
1, otherwise,
(8)
where
𝑘
2
(
𝑖
)denotes the remotest
2
data samples of
𝑖
in
givensample dataset.
From (8), we can we can nd that the more the distance
between
𝑖
and
𝑗
, the smaller the value of
𝑐
𝑖𝑗
.Ifwewant
the corresponding low-dimensional representation V
𝑖
and V
𝑗