ORIGINAL ARTICLE
Nonlinear discriminant clustering based on spectral
regularization
Yubin Zhan
•
Jianping Yin
•
Xinwang Liu
Received: 20 April 2011 / Accepted: 24 March 2012 / Published online: 19 April 2012
Ó Springer-Verlag London Limited 2012
Abstract Owing to sparseness, directly clustering high-
dimensional data is still a challenge problem. Therefore,
obtaining their low-dimensional compact representation by
dimensional reduction is an effective method for clustering
high-dimensional data. Most of existing dimensionality
reduction methods, however, are developed originally for
classification (such as Linear Discriminant Analysis) or
recovering the geometric structure (known as manifold) of
high-dimensional data (such as Locally Linear Embedding)
rather than clustering purpose. Hence, a novel nonlinear
discriminant clustering by dimensional reduction based on
spectral regularization is proposed. The contributions of the
proposed method are two folds: (1) it can obtain nonlinear
low-dimensional representation that can recover the
intrinsic manifold structure as well as enhance the cluster
structure of the original high-dimensional data; (2) the
clustering results can also be obtained in the dimensionality
reduction procedure. Firstly, the desired low-dimensional
coordinates are represented as linear combinations of pre-
defined smooth vectors with respect to the data manifold,
which are characterized by a weighted graph. Then, the
optimal combination coefficients and the optimal cluster
assignment matrix are computed by maximizing the ratio
between the between-cluster scatter and the total scatter
simultaneously as well as preserving the smoothness of the
cluster assignment matrix with respect to the data mani-
fold. Finally, the optimization problem is solved in an
iterative procedure, which is proved to be convergent.
Experiments on UCI data sets and real world data sets
demonstrated the effectiveness of the proposed method for
both clustering and visualization high-dimensional data set.
Keywords Dimensionality reduction Laplacian graph
Spectral regularization Cluster structure
1 Introduction
Real applications in many domains such as pattern recogni-
tion, computer vision, and data mining often lead to very high-
dimensional data. For example, in appearance-based face
recognition, a face image with 64 9 64 pixels is often repre-
sented as a 4096-dimensional vector. Due to sparsity of data in
high-dimensional space, clustering directly on high-dimen-
sional data is still a challenging problem. A natural solution is
to project data into a low-dimensional compact space through
dimensionality reduction method such as PCA before clus-
tering [7]. However, due to the intrinsic gap between clus-
tering and existing dimensionality reduction methods, which
are not designed originally for clustering, clustering structure
of original data cannot be well preserved and may be even
destroyed in the transformed low-dimensional space.
In machine learning and data mining community,
dimensionality reduction is an effective and widely used
approach to deal with high-dimensional data, due to its
potential of mitigating the so-called ‘‘curse of dimensional-
ity’’. Up to now, many dimensionality reduction methods
have been developed. Since in classical clustering task, there
is no any supervised information provided. Here, we only
focus on unsupervised dimensionality reduction methods.
Y. Zhan (&) J. Yin X. Liu
School of Computer, National University of Defense
Technology, Changsha 410073, Hunan,
People’s Republic of China
e-mail: yubinzhan@nudt.edu.cn
J. Yin
e-mail: JPYin@nudt.edu.cn
X. Liu
e-mail: XWLiu@nudt.edu.cn
123
Neural Comput & Applic (2013) 22:1599–1608
DOI 10.1007/s00521-012-0929-y