An Attribute Weighted Fuzzy c-Means Algorithm for Incomplete Datasets Based on
Statistical Imputation
Dan Li
School of Control Science and Engineering
Dalian University of Technology
Dalian, China
ldan@dlut.edu.cn
Chongquan Zhong
School of Control Science and Engineering
Dalian University of Technology
Dalian, China
zhongcq@dlut.edu.cn
Abstract—The problem of missing data is frequently
encountered in real world applications. In this paper, an
attribute weighted fuzzy c-means algorithm for incomplete
data sets is presented. The statistical representation proposed
in our previous work is used here to impute the missing
attribute values, and attribute weighting is involved to
emphasize the contribution of important attributes.
Experimental results indicate that the proposed approach has
good clustering performance.
Keywords-fuzzy clustering; incomplete data; attribute
weighted; statistcal imputation
I. INTRODUCTION (HEADING 1)
Fuzzy clustering is one of the effective techniques in
pattern recognition, which partitions a collection of
multivariate data into meaningful groups to discover data
structure in data sets. In real world applications, lots of data
sets contain missing values. And most of the clustering
algorithms, such as the widely used fuzzy c-means (FCM)
algorithm [1], can’t deal with incomplete data sets directly.
Over the past decades, numerous approaches to the
problem of incomplete data clustering have been developed.
In 2001, Hathaway proposed four strategies to continue the
FCM clustering of incomplete data [2], called as Whole Data
Strategy (WDS), Partial Distance Strategy (PDS), Optimal
Completion Strategy (OCS) and Nearest Prototype Strategy
(NPS). By taking into account the information why data are
missing, Timm developed a fuzzy clustering algorithm
extended from the Gath-Geva algorithm [3]. Honda
partitioned the incomplete datasets into several linear fuzzy
clusters [4]. Besides, Li put forward a FCM algorithm based
on nearest-neighbor intervals to solve the incomplete data [5].
Lim and Kiong proposed an autonomous and deterministic
method to clustering data sets with missing values [6].
In this paper, we describe the development of an attribute
weighted FCM algorithm for incomplete data. The next
section introduces the FCM algorithm and FCM-based
clustering algorithms for incomplete data. Section III
presents the Statistical imputation of missing attribute values
and the proposed attribute weighted FCM algorithm that can
treat incomplete data sets. Section IV presents clustering
results and a comparative study of our proposed algorithm
with various other methods. And finally, conclusions are
drawn in Section V.
II. F
UZZY C-MEANS ALGORITHMS FOR INCOMPLETE
DATA CLUSTERING
A. Fuzzy c-Means Algorithm
Let
12
,,,
n
=⊂"Xxx x
\
be a set of
-
dimensional complete data, and the fuzzy c-means (FCM)
algorithm partitions
into c clusters that are characterized
by prototypes
[]
1
,,
c
= "Vv v. The FCM algorithm
performs clustering by minimizing the objective function
()
2
2
11
,
cn
m
ik k i
ik
Ju
==
=−
¦¦
UV x v
, (1)
where
[]
T
12
,,,
kkk sk
xx= "x is an object datum;
[]
cn
ik
u
×
=∈U
\
a is partition matrix,
[]
,: 0,1∀∈
ik
ik u ,
1
1
c
ik
i
u
=
=
¦
; m is a fuzzification
parameter,
()
1,m ∈∞; and
2
⋅ denotes Euclidean norm.
FCM uses the Lagrange multiplier method, and the
necessary conditions for minimizing (1) are [1]:
1
1
n
m
ik k
k
i
n
m
ik
k
u
u
=
=
=
¦
¦
x
v ˈ for 1, 2, ,ic= " (2)
and
1
1
2
1
2
2
1
2
m
c
ki
ik
t
kt
u
−
−
=
ªº
§·
−
«»
¨¸
=
«»
¨¸
−
«»
©¹
¬¼
¦
xv
xv
. (3)
The procedure of FCM is to optimize the clustering
objective function (1) by alternating optimization (AO), that
is, the minimization steps (2) and (3) are repeated until the
change in memberships and/or prototypes drops below a
certain threshold
ε
.
2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics
978-1-4799-8646-0/15 $31.00 © 2015 IEEE
DOI 10.1109/IHMSC.2015.128
407