![](https://csdnimg.cn/release/download_crawler_static/15478401/bg1.jpg)
A Novel Fuzzy Clustering Algorithm with
Human-computer Cooperation for Incomplete Data
Li Zhang
*
and Lu Wang
School of Information
Liaoning University
Shenyang, Liaoning Province, China
E-mail: zhang_li@lnu.edu.cn
Liyong Zhang
*
School of Control Science and Engineering
Dalian University of Technology
Dalian, Liaoning Province, China
E-mail: zhly@dlut.edu.cn
Abstract—Datasets with missing values are frequent in clustering
analysis. It seems obvious that the reconstruction of missing
attribute values can be considered as the key factors impacting
the clustering performance. For this, a FCM clustering
algorithm for incomplete data sets based on human-computer
cooperation is proposed in this paper. On account of the
uncertainty of missing attributes, intervals are introduced to the
missing attributes based on the nearest-neighbor rule.
Furthermore, the corresponding iterative solution approach is
developed for calculating the missing attributes based on the
optimal completion strategy (OCS) and compulsion strategy. The
experimental results of several data sets can demonstrate the
superiority of the proposed algorithm.
Keywords-Incomplete data; nearest-neighbor interval;
human-computer cooperation; optimal completion strategy; fuzzy
clustering
I. INTRODUCTION
With the advent of the era of Big Data, various algorithms
for data clustering have been into more and more application
fields [1,2,3]. The fuzzy c-means algorithm (FCM) [4] is
effective for separating complete data set between overlapping
clusters [5, 6]. However, the occurrence of incomplete data is a
common problem in practice. Even so, FCM can not be
applied to incomplete data sets clustering analysis directly.
In order to reduce the effects of the occurrence of
incomplete data for clustering, a variety of new approaches
have been proposed. Hathaway and Bezdek proposed four
specific strategies for the clustering of incomplete data [7].
The whole-data strategy (WDS) discards the data with missing
attribute values. The partial distance strategy (PDS) ignores all
missing attributes, and then calculates partial distances using
all available feature values, which was proposed by Dixon [8].
The optimal completion strategy (OCS) regarded the missing
attributes as the additional variables during each iteration. The
missing attributes are set as the corresponding attribute values
of the nearest prototype in nearest prototype strategy (NPS).
Besides, Lin and Su [9] adopted a meta-heuristic technique,
which combined Electromagnetism-like Mechanism with RBC.
Bing and Zhang[10] et al. proposed a hybrid fuzzy clustering
algorithm based on the PSO and the FCM for the incomplete
data clustering.
Interval numbers play an increasingly important role in the
clustering analysis[11,12]. In consideration of the uncertainty
of missing attributes, replacing missing attributes by intervals
can improve the robustness of the missing attribute estimation.
Wang et al. [13] adopted the nearest-neighbor rule to select the
training samples for missing attributes. Moreover, in order to
avoid the endpoints of intervals decided by different species
information, the NIR approach is developed by Zhang et al.
[14] for the interval estimation.
In this paper, a FCM clustering algorithm based on
human-computer cooperation strategy (HCFCM) to estimate
the missing attributes for incomplete data sets is proposed,
which solves the clustering analysis in two steps. Firstly,
according to partial Euclidean distance [8], the q nearest
neighboring points of the missing attribute can be selected,
whose mathematical expectation can be regarded as the ME
attribute value and whose maximum and minimum values can
be viewed as the upper and lower bounds of interval constraint.
Secondly, in consideration of optimal completion strategy
(OCS), the attribute value can be calculated iteratively along
with the memberships and cluster prototypes. To ensure that
the attributes always satisfy interval constraints during
iteration, a compulsion strategy based on nearest neighbor
expectations is proposed, which brings the characteristics of
human-computer cooperation to the algorithm.
II. T
HE OCS VERSION OF FCM
The OCS-FCM algorithm belongs to the imputation
method, which is proposed by Hathaway and Bezdek [7]. The
OCS-FCM algorithm regards the missing attributes
˅˄
M
X
~
as
the additional variables, and calculates them to complete the
missing part of the data set during each iteration. Let the
Lagrange function be
¦¦
c
i
n
k
w
ik
m
ikM
vxuXVUJ
11
2
~
)
~
,,(
. (1)
with the constraint of:
nku
c
i
ik
,,2,1,1
1
"
¦
. (2)
The necessary condition for minimizing the objective
function (1) is the constant iterative process, where the
membership degrees
ik
u , the cluster prototypes
i
v and the
missing values
jk
x
~
are updated by
Projectsupported by the National Nature Science Foundation of China
(No.61174115, No. 61401061)
____________________________________
978-1-4799--5 /15/$31.00 ©2015 IEEE