J
3
ðZ;f
l
g
k
l¼1
Þ¼
X
k
l¼1
l
n
2
l
X
zðjÞ¼zðj
0
Þ¼l
X
m
i¼1
ð
li
d
jj
0
i
þ
li
log
li
Þþ log m
"#
:
ð5Þ
Here, is a positive parameter to control the strength of the
incentive for subspace clustering on more dimensions, n
l
is
the number of objects assigned to the lth cluster,
l
is the
weight vector for the lth cluster for regulating the cluster
size,
li
is the weight of the ith feature in the lth cluster, d
jj
0
i
is the distance between the jth and the j
0
th objects along the
ith dimension. For instance, the distance is given by
d
jj
0
i
¼
jx
ji
x
j
0
i
j
1
n
2
P
n
j
1
¼1
P
n
j
2
¼1
jx
j
1
i
x
j
2
i
j
:
In order to minimize (5) and find the solution clusters
efficiently, Friedman and Meulman proposed to use an
iterative approach to build a weighted dissimilarity matrix
among objects. Then, a hierarchical clustering method-
based nearest neighbors is used to cluster this matrix. The
computational process of COSA may not be scalable to large
data sets. Its computational complexity of building the
weighted dissimilarity matrix is OðhnmL þ n
2
mÞ (n is the
number of objects, m is the number of dimensionality, L is a
predefined parameter to find L nearest neighbors objects of a
given object, and h is the number of iterations), where the
first term of the complexity is for calculating weights of all
dimensions for each object, and the second term is for
creating the matrix. In other words, COSA may not be
practical for large-volume and high-dimensional data.
3ENTROPY WEIGHTING k-MEANS
In this section, we present a new k-means type algorithm for
soft subspace clustering of high-dimensional sparse data. In
the new algorithm, we consider that the weight of a
dimension in a cluster represents the probability of
contribution of that dimension in forming the cluster. The
entropy of the dimension weights represents the certainty
of dimensions in the identification of a cluster. Therefore,
we modify the objective function (1) by adding the weight
entropy term to it so that we can simultaneously minimize
the within cluster dispersion and maximize the negative
weight entropy to stimulate more dimensions to contribute
to the identification of clusters. In this way, we can avoid
the problem of identifying clusters by few dimensions in
sparse data.
The new objective function is written as follows:
F ðW;Z;Þ¼
X
k
l¼1
X
n
j¼1
X
m
i¼1
w
lj
li
ðz
li
x
ji
Þ
2
þ
X
m
i¼1
li
log
li
"#
ð6Þ
subject to
P
k
l¼1
w
lj
¼ 1; 1 j n; 1 l k; w
lj
2f0; 1g
P
m
i¼1
li
¼ 1; 1 l k; 1 i m; 0
li
1:
8
>
>
<
>
>
:
The first term in (6) is the sum of the within cluster
dispersions, and the second term the negative weight
entropy. The positive parameter controls the strength of
the incentive for clustering on more dimensions.
Next, we present the entropy weighting k-means algo-
rithm (EWKM) to solve the above minimization problem.
3.1 EWKM Algorithm
Minimization of F in (6) with the constraints forms a class
of constrained nonlinear optimization problems whose
solutions are unknown. The usual method toward optimi-
zation of F is to use the partial optimization for , Z, and
W. In this method, we first fix Z and and minimize the
reduced F with respect to W. Then, we fix W and and
minimize the reduced F with respect to Z. Afterward, we
fix W and Z and minimize the reduced F to solve .
We can extend the standard k-means clustering process
to minimize F by adding an additional step in each iteration
to compute weights for each cluster. The formula for
computing is given in the following theorem:
Theorem 1. Given matrices W and Z are fixed, F is minimized if
lt
¼
exp
D
lt
P
M
i¼1
exp
D
li
; ð7Þ
where
D
lt
¼
X
n
j¼1
w
lj
ðz
lt
x
jt
Þ
2
:
Proof. We use the Lagrangian multiplier technique to obtain
the following unconstrained minimization problem:
min F
1
ðf
li
g; f
l
gÞ ¼
X
k
l¼1
"
X
n
j¼1
X
m
i¼1
w
lj
li
ðz
li
x
ji
Þ
2
þ
X
m
i¼1
li
log
li
#
X
k
l¼1
l
X
m
i¼1
li
1
!
;
ð8Þ
where ½
1
; ...;
k
is a vector containing the Lagrange
multipliers corresponding to the constraints. The opti-
mization problem in (8) can be decomposed into
k independent minimization problems:
min F
1l
ð
li
;
l
Þ¼
X
n
j¼1
X
m
i¼1
w
lj
li
ðz
li
x
ji
Þ
2
þ
X
m
i¼1
li
log
li
l
X
m
i¼1
li
1
!
for l ¼ 1; ...;k. By setting the gradient of F
1l
with respect
to
li
and
l
to zero, we obtain
@F
1l
@
l
¼
X
m
i¼1
li
1
!
¼ 0 ð9Þ
and
@F
1l
@
lt
¼
X
n
j¼1
w
lj
ðz
lt
x
jt
Þ
2
þ ð1 þ log
lt
Þ
l
¼ 0: ð10Þ
JING ET AL.: AN ENTROPY WEIGHTING k-Means ALGORITHM FOR SUBSPACE CLUSTERING OF HIGH-DIMENSIONAL SPARSE DATA 1029