Graph Regularized Non-negative Matrix
Factorization By Maximizing Correntropy
Le Li
1
, Jianjun Yang
2
, Kaili Zhao
3
, Yang Xu
3
, Honggang Zhang
3
, Zhuoyi Fan
4
1
School of Computer Science, University of Waterloo, Ontario N2L3G1, Canada
2
Department of Computer Science, University of North Georgia, Oakwood, GA 30566, USA
3
PRIS Lab, Beijing University of Posts and Telecommunications, Beijing 100876, P.R.China
4
SEEE, Huazhong University of Science and Technology, Hubei 430074, P.R.China
Email: l248li@uwaterloo.ca, jianjun.yang@ung.edu, {xj992adolphxy, kailizhao1989}@gmail.com, zhhg@bupt.edu.cn,
fanzhuoyi@hust.edu.cn
Abstract— Non-negative matrix factorization (NMF) has
proved effective in many clustering and classification tasks.
The classic ways to measure the errors between the original
and the reconstructed matrix are l
2
distance or Kullback-
Leibler (KL) divergence. However, nonlinear cases are not
properly handled when we use these error measures. As
a consequence, alternative measures based on nonlinear
kernels, such as correntropy, are proposed. However, the
current correntropy-based NMF only targets on the low-
level features without considering the intrinsic geometrical
distribution of data. In this paper, we propose a new
NMF algorithm that preserves local invariance by adding
graph regularization into the process of max-correntropy-
based matrix factorization. Meanwhile, each feature can
learn corresponding kernel from the data. The experiment
results of Caltech101 and Caltech256 show the benefits of
such combination against other NMF algorithms for the
unsupervised image clustering.
I. INTRODUCTION
Given a collection of images, the clustering algorithms
attempt to group the dataset into multiple clusters such
that images in the same cluster are similar to each other
in terms of the semantics information. In this process,
a good feature extraction method is vital to the clus-
tering performance. Essentially, clustering (e.g. k-means)
or classification algorithms (e.g. support vector machines
[1]–[4]) map the low-level image features to semantic in-
formation. These algorithms have a variety of applications
in different areas [5]–[19]. If the extracted image features
can reflect the latent semantic concepts, we believe it
can somehow better boost the clustering/classification
performance.
Recently, non-negative matrix factorization (NMF) has
proven to be a powerful matrix factorization tool for data
representation. Matrix factorization decomposes the orig-
inal matrix X into multiple low-rank matrices, such that
their product approximates X. In NMF, X is decomposed
into two non-negative matrices H (basis matrix) and W
(coefficient matrix). The vectors in H spans a latent
semantic space where each basis vector defines a semantic
Manuscript submitted to Journal of Computers.
topic. By doing so, unlike other matrix factorization meth-
ods, such as singular value decomposition, that interpret
the data as both additive and subtractive combination of
semantics NMF only allows additive relationship. This
constraint has proved to be closer to the way how humans
perceive and understand the data [20]–[22]. Based on this
methodology, we can map the low-level image features
into the additive combination of latent semantics for the
clustering.
Extensive work has been done to investigate the NMF
algorithm for different clustering tasks. Generally, the
matrix decomposition is done by minimizing the errors
between original and reconstructed matrix using l
2
dis-
tance or KL divergence [23]–[25]. One of the concerns is
that they are linear similarity measures, which may not be
suitable for data with nonlinear structure, like images [26].
A possible solution is to use nonlinear similarity measure
to model the error (e.g. kernlized nonlinear mapping [27],
[28]). Among these nonlinear methods, we are especially
interested in the NMF based on maximizing correntropy
criterion (MCC). Correntropy is a generalized nonlinear
measure between two variables. MCC-based methods
have proved effective in many areas, e.g. cancer clustering
[29], face recognition [30] and software defect prediction
[31]. Another approach is preserving the geometric struc-
ture of data based on the manifold assumption of data
distribution. To be more specific, the authors in [32], [33]
exploit the local invariance and encode such geometrical
information by constructing a nearest neighbor graph.
Thus, at least two factors are included in modeling this
process: the distance between X and H ∗ W based on
l
2
distance (or KL divergence) at low-level feature space,
and the graph regularization. Furthermore, the authors in
[34] incorporate such intrinsic geometric information in
multiple manifolds.
In this paper, we propose a graph regularized NMF
algorithm based on maximizing correntropy criterion for
unsupervised image clustering. We can leverage MCC
to properly model the errors in low-level feature space.
Furthermore, the graph regularization can keep the correct
geometric information in our factorization process. To our
knowledge, this is the first work that combines MCC and