146 J. Hu et al. / Knowledge-Based Systems 132 (2017) 144–155
membership, dependency of attributes and reduction of attributes.
The binary relations defined on attributes partition the points of
universe into a collection of elementary classes. Any vague (impre-
cise) concept which is a subset of the universe can be expressed
by two exact sets: the lower and upper approximations. The lower
approximation consists of the objects surely belonging to the vague
concept, whereas the upper approximation consists of the objects
possibly belonging to the vague concept. Based on rough approx-
imations, the universe can be divided into three pair-wise dis-
joint regions: the positive region corresponds to the lower approx-
imation, the negative region corresponds to the complement of
the upper approximation, and the boundary region corresponds to
the difference between the upper and lower approximations. From
positive and negative regions, three-way decisions can be made:
deterministic decisions (immediate acceptance, immediate rejec-
tion) can be made from objects in positive and negative regions
respectively according to their memberships in the given decision
class while nondeterministic decisions for objects in the boundary
region. The rough membership represents the membership of an
object belongs to a set. Dependency of attributes is used to discov-
ery dependencies between attributes. Lastly, reduction of attributes
is a set of attributes that preserves partition ability of original at-
tributes.
Nowadays, RST has been widely applied in clustering analysis.
Some rough clustering researches constructed each cluster with an
interval or rough set characterized by its lower and upper approx-
imations or a lower approximation and a boundary region, respec-
tively [6,28–30] . Others applied principle of attribute reduction in
RST to select the feature with the largest differentiation to split
the universe and a series of divisive hierarchical clustering based
on RST has been proposed with the result of hard partition on the
universe [31,32] . And some others combined clustering with three-
way decisions theory [33–35] .
2.3. Cluster ensemble
Cluster ensemble method involves roughly two major steps: en-
semble generation and consensus aggregation. In the first step, a
set of diverse clustering solutions will be produced using a gener-
ative mechanism, such as different data perturbation techniques,
homogenous algorithm with different parameters (or initializa-
tions) and heterogeneous basic clustering algorithms [36] , etc. The
second step is the key step in any cluster ensemble approaches
where a consensus aggregation of the clustering solutions in the
ensemble is acquired. A large number of cluster ensemble meth-
ods have been proposed, which can be mainly divided into two
categories: ensemble method based on objects co-occurrence and
method based on median partition [2] . The first type of method
finds the final result by calculating how many times two objects
belong together to the same cluster or how many times an ob-
ject belongs to one cluster in the cluster ensemble. Voting and co-
association matrix based methods fall into this category.
2.4. Random forests classifier
Among the different unsupervised machine learning methods,
k -Nearest-Neighbor Classifiers ( k -NN) [37] , Support Vector Ma-
chines (SVM) [38] and Random Forests (RF) [39] are three typical
classifiers. k -NN is based on learning by analogy, that is, by com-
paring the unknown sample with its k closest neighbors from the
training set and then assigning the unknown sample to the most
common class among its k-nearest neighbors. The number k is a
user-defined constant which influences the classification need to
be selected by various heuristic techniques. SVM finds a nonlin-
ear mapping (by defining a suitable Kernel function) to transform
the original training data into an appropriate higher dimension and
then searches for a linear optimal separating hyperplane in this
new dimension using support vectors and margins to classify the
training set. The effectiveness of SVM depends on the selection of
kernel function as well as its parameters, the soft margin param-
eter. The SVM suffers from efficiency issues, that is, the training
time of even the fastest SVM can be extremely slow.
Random Forests (RF) [39] is a recently developed unsupervised
ensemble learning method for classification, regression and other
tasks. RF is capable of solving the above difficulties. This method
grows an ensemble of decision trees in advance with the governing
of random input vectors generated by bagging, random split selec-
tion or the random subspace, and then output the most popular
class of all the individual trees in the forest. The RF solves decision
trees’ problem of overfitting to their training set. The generaliza-
tion error for random forests depends on two things: the strength
of each individual tree in the forest (the bigger it is, the better)
and the dependence between them (the smaller it is, the better).
The random forests can process dataset with a large number of
features without feature selection and estimate the importance of
features automatically. It runs efficiently on data set with missing
value, class-imbalanced data sets or even on large data set.
3. Incremental fuzzy cluster ensemble learning based on rough
set theory
In this section, we first describe the basic idea of incremental
fuzzy cluster ensemble learning based on RST (IFCERS) and then
present the key steps of proposed method in detail.
3.1. The basic idea of IFCERS
Fig. 1 provides an overview of the IFCERS. In IFCERS, well-
known soft clustering techniques such as FCM, rough k-means,
and rough-fuzzy k-means by setting the number of clusters as
ground number of clusters are employed first to generate multiple
clustering solutions, and each clustering solution will be formed
into fuzzy membership matrix. Then, the clustering solutions with
lower fuzzy partition entropy will be selected to form a fuzzy clus-
tering ensemble, and the positive region, boundary region and neg-
ative region of fuzzy clustering ensemble are obtained based on
core idea of RST (the principle of rough approximation construc-
tion). Next, the improved accurate group structure of data points
in the positive region is acquired by applying a fuzzy cluster en-
semble method. Finally, a supervised learning method, called ran-
dom forests, is applied incrementally to points in the boundary and
negative region with the group structure of the positive region to
yield better final clustering results.
3.2. The key steps of IFCERS
In the following, we describe the key steps of our method in
detail.
3.2.1. The generation and the formalization of base soft clusterings
In this step, the well-known soft clustering techniques such as
FCM, rough k-means, and rough-fuzzy k-means with the number
of cluster being set to the true class of a data set are adopted first
to generate multiple clustering solutions.
Actually, based on different mathematical theories, these soft
clustering methods are mainly divided into: soft clustering method
based on FST, soft clustering method based on RST or a combi-
nation of both theories, and soft clustering analysis based on ev-
idence theory. On the other hand, due to the introduction of the
different abstract, describe and calculate methods of the uncertain
cluster structure, the expression forms of these soft clustering re-
sults are different, which brings many difficulties to combine the
multiple base clustering results.