Multiple Kernel k-Means Clustering with Matrix-Induced Regularization
Xinwang Liu, Yong Dou, Jianping Yin
School of Computer
National University
of Defense Technology
Changsha, China, 410073
Lei Wang
School of Computer Science
and Software Engineering
University of Wollongong
NSW, Australia, 2522
En Zhu
School of Computer
National University
of Defense Technology
Changsha, China, 410073
Abstract
Multiple kernel k-means (MKKM) clustering aims to opti-
mally combine a group of pre-specified kernels to improve
clustering performance. However, we observe that existing
MKKM algorithms do not sufficiently consider the corre-
lation among these kernels. This could result in selecting
mutually redundant kernels and affect the diversity of in-
formation sources utilized for clustering, which finally hurts
the clustering performance. To address this issue, this pa-
per proposes an MKKM clustering with a novel, effective
matrix-induced regularization to reduce such redundancy and
enhance the diversity of the selected kernels. We theoreti-
cally justify this matrix-induced regularization by revealing
its connection with the commonly used kernel alignment cri-
terion. Furthermore, this justification shows that maximiz-
ing the kernel alignment for clustering can be viewed as a
special case of our approach and indicates the extendability
of the proposed matrix-induced regularization for designing
better clustering algorithms. As experimentally demonstrated
on five challenging MKL benchmark data sets, our algorithm
significantly improves existing MKKM and consistently out-
performs the state-of-the-art ones in the literature, verifying
the effectiveness and advantages of incorporating the pro-
posed matrix-induced regularization.
Introduction
Clustering algorithms aim to partition a group of samples
into k clusters, where the similarity of samples from intra-
clusters shall be greater than that from inter-clusters (Har-
tigan 1975). As one of the classical clustering algorithms,
k-means provides an intuitive and effective way to perform
clustering. In specific, the k-means clustering is composed
of (i) calculating k prototypes (i.e., centres of k clusters)
given an assignment of samples to clusters and (ii) updating
the assignment matrix by minimizing the sum-of-squares
cost given the prototypes. These two steps are alternately
performed until convergence. Due to its conceptual simplic-
ity, easy-implementation and high efficiency, k-means clus-
tering has been intensively studied and extended (Yu et al.
2012; G
¨
onen and Margolin 2014; Cai, Nie, and Huang 2013;
Du et al. 2015). As an important extension, kernel k-means
first maps data onto a high-dimensional space through a fea-
Copyright
c
2016, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
ture mapping and then conducts a standard k-means cluster-
ing in that space (Sch
¨
olkopf, Smola, and M
¨
uller 1998). This
enables kernel k-means to handle the linearly non-separable
problem in an input space that k-means suffers from.
In many practical applications of clustering, samples are
represented by multiple groups of features extracted from
different information sources. For example, three kinds of
feature representations
1
, colour, shape and texture, are ex-
tracted to distinguish one flower from another (Nilsback and
Zisserman 2006). These different sources usually provide
complementary information, and it is better to let learning
algorithms optimally combine them in order to obtain excel-
lent clustering. This line of research is known as multiple
kernel (view) clustering in the literature.
Many efforts have been devoted to improving multiple
kernel clustering from all kinds of aspects (Zhao, Kwok, and
Zhang 2009; Lu et al. 2014; Xia et al. 2014; Zhou et al. 2015;
Kumar and Daum
´
e 2011). In this paper, we explore a bet-
ter way to combine a set of pre-specified kernels for clus-
tering. The existing research on this aspect can roughly be
grouped into two categories. The first category learns a con-
sensus matrix via low-rank optimization (Xia et al. 2014;
Zhou et al. 2015; Kumar and Daum
´
e 2011). In (Xia et al.
2014), a transition probability matrix is constructed from
each single view, and they are used to recover a shared
low-rank transition probability matrix as a crucial input to
the standard Markov chain method for clustering. The work
in (Zhou et al. 2015) proposes to capture the structures of
noises in each kernel and integrate them into a robust and
consensus framework to learn a low-rank matrix. The al-
gorithm (Kumar and Daum
´
e 2011) learns the clustering in
one view and uses it to “label” the data in other views to
modify a similarity matrix. By following multiple kernel
learning (MKL) framework, the other category optimizes a
group of kernel coefficients, and uses the combined kernel
for clustering (Yu et al. 2012; G
¨
onen and Margolin 2014;
Du et al. 2015; Lu et al. 2014). The work in (Yu et al. 2012)
proposes a multiple kernel k-means clustering algorithm.
Similar work has also been done in (G
¨
onen and Margolin
2014), with the difference that the kernels are combined in a
localized way to better capture the sample-adaptive charac-
teristics of data. Differently, by replacing the squared error
1
In literature, each representation is also termed as a view.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)