Unified Spectral Clustering with Optimal Graph
Zhao Kang,
1
Chong Peng,
2
Qiang Cheng,
3
Zenglin Xu
1∗
1
School of Computer Science and Engineering, University of Electronic Science and Technology of China, China
2
Department of Computer Science, Southern Illinois University, Carbondale, USA
3
Institute of Biomedical Informatics and Department of Computer Science, University of Kentucky, Lexington, USA
zkang@uestc.edu.cn, pchong@siu.edu, qiang.cheng@uky.edu, zlxu@uestc.edu.cn
Abstract
Spectral clustering has found extensive use in many areas.
Most traditional spectral clustering algorithms work in three
separate steps: similarity graph construction; continuous la-
bels learning; discretizing the learned labels by k-means clus-
tering. Such common practice has two potential flaws, which
may lead to severe information loss and performance degra-
dation. First, predefined similarity graph might not be optimal
for subsequent clustering. It is well-accepted that similarity
graph highly affects the clustering results. To this end, we
propose to automatically learn similarity information from
data and simultaneously consider the constraint that the sim-
ilarity matrix has exact c connected components if there are
c clusters. Second, the discrete solution may deviate from the
spectral solution since k-means method is well-known as sen-
sitive to the initialization of cluster centers. In this work, we
transform the candidate solution into a new one that better
approximates the discrete one. Finally, those three subtasks
are integrated into a unified framework, with each subtask it-
eratively boosted by using the results of the others towards
an overall optimal solution. It is known that the performance
of a kernel method is largely determined by the choice of ker-
nels. To tackle this practical problem of how to select the most
suitable kernel for a particular data set, we further extend our
model to incorporate multiple kernel learning ability. Exten-
sive experiments demonstrate the superiority of our proposed
method as compared to existing clustering approaches.
Introduction
Clustering is a fundamental technique in machine learning,
pattern recognition, and data mining (Huang et al. 2017). In
past decades, a variety of clustering algorithms have been
developed, such as k-means clustering and spectral cluster-
ing.
With the benefits of simplicity and effectiveness, k-means
clustering algorithm is often adopted in various real-world
problems. To deal with the nonlinear structure of many prac-
tical data sets, kernel k-means (KKM) algorithm has been
developed (Sch
¨
olkopf, Smola, and M
¨
uller 1998), where data
points are mapped through a nonlinear transformation into a
higher dimensional feature space in which the data points
∗
Corresponding author.
Copyright
c
2018, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
are linearly separable. KKM usually achieves better perfor-
mance than the standard k-means. To cope with noise and
outliers, robust kernel k-means (RKKM) (Du et al. 2015) al-
gorithm has been proposed. In this approach, the squared
2
norm of error construction term is replaced by
2,1
norm.
RKKM demonstrates superior performance on a number of
benchmark data sets. The performance of such model-based
methods heavily depends on whether the data fit the model.
Unfortunately, in most cases, we do not know the distribu-
tion of data in advance. To some extent, this problem is alle-
viated by multiple kernel learning.
Spectral clustering is another widely used clustering
method (Kumar, Rai, and Daume 2011). It enjoys the ad-
vantage of exploring the intrinsic data structures by ex-
ploiting the different similarity graphs of data points (Yang
et al. 2015). There are three kinds of similarity graph
constructing strategies: k-nearest-neighborhood (knn); -
nearest-neighborhood; The fully connected graph. Here,
some open issues arise (Huang, Nie, and Huang 2015): 1)
how to choose a proper neighbor number k or radius ;2)
how to select an appropriate similarity metric to measure
the similarity among data points; 3) how to counteract the
adverse effect of noise and outliers; 4) how to tackle data
with structures at different scales of size and density. Unfor-
tunately, all of these issues heavily influence the clustering
results (Zelnik-Manor and Perona 2004). Nowadays, many
data are often high dimensional, heterogeneous, and without
prior knowledge, and it is therefore a fundamental challenge
to define a pairwise similarity graph for effective spectral
clustering.
Recently, (Zhu, Change Loy, and Gong 2014) construct
robust affinity graphs for spectral clustering by identifying
discriminative features. It adopts a random forest approach
based on the motivation that tree leaf nodes contain discrimi-
native data partitions, which can be exploited to capture sub-
tle and weak data affinity. This approach shows better per-
formance than other state-of-the-art methods including the
Euclidean-distance-based knn (Wang et al. 2008), dominant
neighbourhoods (Pavan and Pelillo 2007), consensus of knn
(Premachandran and Kakarala 2013), and non-metric based
unsupervised manifold forests (Pei, Kim, and Zha 2013).
The second step of spectral clustering is to use the spec-
trum of the similarity graph to reveal the cluster structure of
the data. Due to the discrete constraint on the cluster labels,
The Thirty-Second AAAI Conference
on Artificial Intelligence (AAAI-18)
3366