Knowledge-Based Systems 91 (2016) 179–188
Contents lists available at ScienceDirect
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
Hierarchical cluster ensemble model based on knowledge granulation
Jie Hu
a
, Tianrui Li
a,∗
, Hongjun Wang
a
, Hamido Fujita
b
a
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
b
Faculty of Software and Information Science, Iwate Prefectural University, 020-0693, Iwate, Japan
article info
Article history:
Received 26 July 2015
Revised 14 September 2015
Accepted 3 October 2015
Availableonline16October2015
Keywords:
Cluster ensemble
Granular computing
Rough sets
abstract
Cluster ensemble has been shown to be very effective in unsupervised classification learning by generating
a large pool of different clustering solutions and then combining them into a final decision. However, the
task of it becomes more difficult due to the inherent complexities among base cluster results, such as uncer-
tainty, vagueness and overlapping. Granular computing is one of the fastest growing information-processing
paradigms in the domain of computational intelligence and human-centric systems. As the core part of gran-
ular computing, the rough set theory dealing with inexact, uncertain, or vague information, has been widely
applied in machine learning and knowledge discovery related areas in recent years. From these perspectives,
in this paper, a hierarchical cluster ensemble model based on knowledge granulation is proposed with the
attempt to provide a new way to deal with the cluster ensemble problem together with ensemble learning
application of the knowledge granulation. A novel rough distance is introduced to measure the dissimilarity
between base partitions and the notion of knowledge granulation is improved to measure the agglomeration
degree of a given granule. Furthermore, a novel objective function for cluster ensembles is defined and the
corresponding inferences are made. A hierarchical cluster ensemble algorithm based on knowledge granula-
tion is designed. Experimental results on real-world data sets demonstrate the effectiveness for better cluster
ensemble of the proposed method.
© 2015 Elsevier B.V. All rights reserved.
1. Introduction
Clustering is an important unsupervised classification technique,
which has been extensively researched in different fields such as
statistics, pattern recognition, machine learning, and data mining
[1–3]. Following several clustering criteria and different methods of
similarity measurement, the underlying structure of a data set can be
revealed, e.g., the similar objects can be grouped into the same clus-
ter, while dissimilar objects can be assigned to different clusters [4].
Actually, each clustering algorithm has its own strengths and
weaknesses, and there is no single clustering algorithm capable of
delivering sound solutions for all data sets. With the objective of
improving the robustness, consistency, novelty and stability of sin-
gle clustering algorithm’s results, cluster ensemble (cluster fusion, or
consensus clustering) has emerged as a tool for leveraging the con-
sensus across multiple clustering results and combining them into an
optimal solution. It has gained increasing attention of researchers in
recent years [4–8].
Generally, cluster ensemble method involves two major steps:
Generation and Consensus Function. In the first step, a set of diverse
∗
Corresponding author. Tel.: +86 28 66367458.
E-mail addresses: jiehu@swjtu.edu.cn (J. Hu), trli@swjtu.edu.cn, trli30@gmail.com
(T. Li), wanghongjun@swjtu.edu.cn (H. Wang), HFujita-799@acm.org (H. Fujita).
partitions of objects will be produced using a generative mechanism,
such as by homogenous algorithm with different parameters (or ini-
tializations) [9,10] or heterogeneous algorithms [5], etc. The consen-
sus function is the main step in any cluster ensemble algorithm, by
which a new partition is acquired by integrating all partitions ob-
tained in the generation step. There are numerous consensus func-
tion approaches, which can be classified into two main types: meth-
ods based on objects co-occurrence and methods based on median
partition [4]. In the median partition based consensus function ap-
proach, the resulting partition is acquired by finding an optimiza-
tion partition which maximizes the similarity (or minimizes the dis-
similarity) with all partitions in the cluster ensemble. Although a
great number of cluster ensemble methods have been proposed over
the past years, there are relatively few techniques in handling un-
certain, vague and overlapping information in the cluster ensemble
process.
Granular computing (GrC) [11,12], emerged as one of the fastest
growing information-processing paradigms in the domain of compu-
tational intelligence and human-centric systems, has been success-
fully applied in many fields. As a core part of GrC, the rough set the-
ory (RST) [13] forms the granules through the equivalence relation
defined on objects of the universe and approximately express infor-
mation granulation by using a pair of non-numerical operators, i.e.,
lower and upper approximation operators [14]. Nowadays, RST has
http://dx.doi.org/10.1016/j.knosys.2015.10.006
0950-7051/© 2015 Elsevier B.V. All rights reserved.