Knowledge-Based Systems 163 (2019) 510–517
Contents lists available at ScienceDirect
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
Short communication
Low-rank kernel learning for graph-based clustering
Zhao Kang, Liangjian Wen, Wenyu Chen
∗
, Zenglin Xu
∗
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
a r t i c l e i n f o
Article history:
Received 9 May 2018
Received in revised form 4 September 2018
Accepted 7 September 2018
Available online 26 September 2018
Keywords:
Low-rank kernel matrix
Graph construction
Multiple kernel learning
Clustering
Noise
a b s t r a c t
Constructing the adjacency graph is fundamental to graph-based clustering. Graph learning in kernel
space has shown impressive performance on a number of benchmark data sets. However, its performance
is largely determined by the chosen kernel matrix. To address this issue, previous multiple kernel learning
algorithm has been applied to learn an optimal kernel from a group of predefined kernels. This approach
might be sensitive to noise and limits the representation ability of the consensus kernel. In contrast to
existing methods, we propose to learn a low-rank kernel matrix which exploits the similarity nature of
the kernel matrix and seeks an optimal kernel from the neighborhood of candidate kernels. By formulating
graph construction and kernel learning in a unified framework, the graph and consensus kernel can be
iteratively enhanced by each other. Extensive experimental results validate the efficacy of the proposed
method.
© 2018 Elsevier B.V. All rights reserved.
1. Introduction
Clustering is a fundamental and important technique in ma-
chine learning, data mining, and pattern recognition [1–3]. It aims
to divide data samples into certain clusters such that similar ob-
jects lie in the same group. It has been utilized in various do-
mains, such as image segmentation [4], gene expression analy-
sis [5], motion segmentation [6], image clustering [7], heteroge-
neous data analysis [8], document clustering [9,10], social media
analysis [11], subspace learning [12,13]. During the past decades,
clustering has been extensively studied and many clustering meth-
ods have been developed, such as K-means clustering [14,15],
spectral clustering [16,17], subspace clustering [18,19], hierarchi-
cal clustering [20], matrix factorization-based algorithms [21–23],
graph-based clustering [24,25], and kernel-based clustering [26].
Among them, the K-means and spectral clustering are especially
popular and have been extensively applied in practice.
Basically, the K-means method iteratively assigns data points
to their closest clusters and updates cluster centers. Nonetheless,
it cannot partition arbitrarily shaped clusters and is notorious for
its sensitivity to the initialization of cluster centers [27]. Later, the
kernel K-means (KKM) was proposed to characterize data nonlin-
ear structure information [28]. However, the user has to specify a
kernel matrix as input, i.e., the user must assume a certain shape of
the data distribution which is generally unknown. Consequently,
the performance of KKM is highly dependent on the choice of
the kernel matrix. This will be a stumbling block for the practical
use of kernel method in real applications. This issue is partially
∗
Corresponding authors.
E-mail addresses: cwy@uestc.edu.cn (W. Chen), zlxu@uestc.edu.cn (Z. Xu).
alleviated by multiple kernel learning (MKL) technique which lets
an algorithm do the picking or combination from a set of candidate
kernels [29,30]. Since the kernels might be corrupted due to the
contamination of the original data with noise and outliers. Thus,
the induced kernel might still not be optimal [31]. Moreover,
enforcing the optimal kernel being a linear combination of base
kernels could lead to limited representation ability of the optimal
kernel. Sometimes, MKL approach indeed performs worse than a
single kernel method [32].
Spectral clustering, another classic method, presents more ca-
pability in detecting complex structures of data compared to other
clustering methods [33,34]. It works by embedding the data points
into a vector space that is spanned by the spectrum of affinity
matrix (or data similarity matrix). Therefore, the quality of the
similarity graph is crucial to the performance of spectral cluster-
ing algorithm. Previously, the Gaussian kernel function is usually
employed to build the graph matrix. Unfortunately, how to select
a proper Gaussian parameter is an open problem [35]. Moreover,
the Gaussian kernel function is sensitive to noise and outliers.
Recently, some advanced techniques have been developed to
construct better similarity graphs. For instance, Zhu et al. [36]
used a random forest-based method to identify discriminative fea-
tures, so that subtle and weak data affinity can be captured. More
importantly, adaptive neighbors method [37] and self-expression
approach [38] have been proposed to learn a graph automatically
from the data. This automatic strategy can tackle data with struc-
tures at different scales of size and density and often provides a
high-quality graph, as demonstrated in clustering [37,39], semi-
supervised classification [40,41], and many others.
In this paper, we learn the graph in kernel space. To address
the kernel dependence issue, we develop a novel method to learn
https://doi.org/10.1016/j.knosys.2018.09.009
0950-7051/© 2018 Elsevier B.V. All rights reserved.