A Fast Incremental Spectral Clustering Algorithm for Image Segmentation
Xiaochun Wang, Chenyu Chang
School of Software Engineering
Xi’an Jiaotong University
Xi’an, China
{xiaocchunwang@mail,chenyu_chang@stu}.xjtu.edu.cn
Xia Li Wang
School of Information Engineering
Changan University
Xi’an, China
xlwang@chd.edu.cn
Abstract—Clustering aims at grouping a given set of data
points into a number of clusters without resorting to any a
priori knowledge. Due to its important applications in data
mining, many techniques have been developed for clustering.
Being one of the most popular modern clustering algorithms,
spectral clustering is simple to implement, can be solved
efficiently by standard linear algebra software, and very often
outperforms traditional clustering algorithms such as the k-
means algorithm. However, it is not very well scalable to
modern large datasets which typically have millions of items.
To partially circumvent this drawback, in this paper, we
propose an integration-based fast incremental spectral
clustering algorithm which is particularly designed for image
segmentation tasks. The algorithm first divides a given large
dataset into several smaller partitions, next applies spectrum
clustering to each partition, and finally integrates them using a
BIRCH tree. Experiments performed on image data
demonstrate the efficacy of our method.
Keywords-spectral clustering; BIRCH tree; image
segmentation
I. INTRODUCTION
To separate data points into different groups according to
their similarities, clustering is one of the most widely used
techniques for exploratory data analysis, with applications
ranging from statistics, computer science, biology to social
sciences or psychology [1-3]. Being a competitive clustering
method, spectral clustering is based on spectral graph theory.
Compared with traditional clustering algorithms, spectral
clustering algorithms have several advantages. For one
example, unlike k-means algorithm, the spectral clustering
algorithm directly works on the Laplacian matrix of feature
vectors, and therefore can identify non convex spherical
clusters. For another example, the computational complexity
of the spectral clustering algorithm depends only on the
number of data points but has nothing to do with the
dimensionality of the data, and therefore can deal with
datasets of high dimensionalities. Finally, spectral clustering
is very easy to implement, and can use the standard linear
algebra methods for fast solutions.
However, spectral clustering has its own problems.
Firstly, there exist no universal ways to determine the
relevant parameters and the similarity matrix for spectral
clustering algorithms [4]. Secondly, although Euclidean
distance is usually used as the similarity measure, how to
create a similarity matrix, which more accurately reflects the
approximate relationship between data points in the sense
that the similarity measure is higher among similar data
points while the similarity is lower for those dissimilar data
points, is a problem that every spectral clustering algorithm
must solve. Thirdly, the computation cost for calculating the
similarity matrix and correspondingly the eigen values and
eigenvectors is usually O(N
3
), which prevents spectral
clustering from being practically applicable for data sets of
sizes in order of more than a thousand [5].
With the development of artificial intelligence, machine
learning, and pattern recognition, it becomes more and more
important to solve the problem of how to quickly and
efficiently clustering in the face of large scale data set. To
apply this very competitive clustering methodology for
processing large-scale data sets, in this paper, we propose an
incremental spectral clustering approach, which, when
combined with BIRCH tree, manifests its effectiveness and
efficiency.
The rest of the paper is organized as follows. In Section
2, we present some related work of spectral clustering and
basic knowledge of BIRCH tree algorithm. In Section 3, an
integration-based fast incremental spectral clustering
algorithm is introduced. In Section 4, we present the results
of experiments conducted to evaluate the performance of the
proposed algorithm. Finally, the conclusions are made in
Section 5.
II. R
ELATED WORK
Work related to the methods presented in this paper falls
into two main categories, spectral clustering and BIRCH
tree.
A. Spectral Clustering
Spectral clustering goes back to Donath and Hoffman
(1973), who first suggested to compute graph partitions
based on eigenvectors of the adjacency matrix [6]. In the
machine learning community, spectral clustering became
popular by the works of Shi and Malik (2000) [7], Ng et al.
(2002) [8], Meila and Shi (2001) [9], and Ding (2004) [10].
A huge number of papers have subsequently been published,
dealing with various extensions, new applications, and
theoretical results on spectral clustering [11].