X. Huang et al. / Knowledge-Based Systems 0 0 0 (2018) 1–15 3
ARTICLE IN PRESS
JID: KNOSYS [m5G; March 26, 2018;14:47 ]
usually achieve interesting results in most of cases. Using the way
of βth power to constrain feature weights, Huang et al. [11] pro-
posed a k -means type clustering framework which is able to inte-
grate both intra-cluster compactness and inter-cluster separations.
However, when all objects in a cluster share the same value on
a feature in the clustering process, i.e., the scatter of this feature
is zero in the cluster, the algorithms by using the way of the βth
power to constrain feature weights will assign the weight of the
zero scatter feature to one and assign the weights of the other fea-
tures to zeros. Generally, it is unreasonable to use only one feature
to distinguish a cluster on a data set.
Another method of using the entropy to constrain feature
weights proposed in the EWkmeans algorithm [7] aims to encour-
age more features to participate in the clustering process by simul-
taneously minimizing the within cluster scatter and maximizing
the negative weight entropy. The updating rules of the EWkmeans
algorithm can be achieved by minimizing the following objective
function:
P (U, W, Z) =
k
p=1
n
i =1
u
ip
m
j=1
w
pj
( x
ij
− z
pj
)
2
+ γ
k
p=1
m
j=1
w
pj
log w
pj
(3)
subject to the constraint conditions Eq. (2) , where γ is a parame-
ter to balance the scatter and the entropy of the weights. Follow-
ing to EWkmeans, Chen et al. [9,26] proposed two types of auto-
mated two-level variable weighting clustering algorithms for mul-
tiview data. Deng et al. proposed an enhanced soft subspace clus-
tering algorithm [10] by integrating intra-cluster compactness and
inter-cluster separation with the entropy constraint to the feature
weights. In EWkmeans algorithm, the updating rule of the weight
solved by minimizing the objective function Eq. (3) is as follow:
w
pj
=
exp (−D
pj
/γ )
m
l=1
exp (−D
pl
/γ )
, (4)
where D
pj
is the scatter of cluster p on feature j . From this updat-
ing rule, we can observe that when D
pj
is large, e.g. D
pj
= 10 0 0 ,
exp (−D
pj
/γ ) nearly equals to zero (according to the reference [7] ,
the value of γ ranges from 0.1 to 7). Therefore, the clustering pro-
cess of the k -means algorithms with entropy regularization are of-
ten dominated by seldom features. Even, numeric overflow errors
may happen when the scatter is large.
2.2. k -modes type clustering on categorical data sets
2.2.1. k -modes type algorithms
Since working only on numerical data sets prohibits the k -
means type algorithms from being used to cluster real-life data
including categorical values, Huang proposed the k -modes algo-
rithm [27] which employs a simple matching dissimilarity measure
to handle categorical objects, replaces the means of clusters with
modes, and utilizes a frequency-based method to update modes in
the clustering process. Based on this technique, most of k -means
type methods [1,7,8] working on numerical data can be modified
to k -modes type algorithms for clustering a categorical data set by
simply using modes to replace means. Based on the k -modes al-
gorithm, Cao et al. proposed W- k -modes algorithm [28] by using
complement entropy and Bai et al. [29] proposed another weight-
ing k -modes algorithm by using the βth power to constrain feature
weights. To further pursue the performance, Bai and Liang [30] in-
troduced the inter-cluster separation to the conventional k -modes
and proved the convergence of their proposed algorithm. Qian
et al. [31] proposed a novel data-representation scheme for the
categorical data by mapping a set of categorical objects into a Eu-
clidean space. Based on the data-representation scheme, Qian et al.
[31] developed a general clustering framework for space structure
of categorical data.
However, since the centroid in a dimension is usually repre-
sented by a feature value, the representability of the centroids in
this type of algorithms is not enough, especially when the distri-
bution of feature values is uniform.
2.2.2. Generalization of centroid representability in k -modes type
algorithms
Conventional k -modes algorithm chooses the feature value of
maximal frequency to represent a cluster on a feature. The method
often ignores the representability of other feature values whose
frequencies are close to the maximal one in the cluster. For the
sake of eliminate this flaw, many improved algorithms [14,32–
35]
were proposed by allocating proper weights to the feature
values which are not maximal frequency. A frequency-based cen-
troid is introduced to the k -modes algorithm by San et al. [32] .
The higher the frequency of a feature value in a cluster is, the
larger representability the feature value has in the cluster. The
relative feature frequencies as the weights are adopted to re-
flect the representability of cluster centroid in a cluster in ref-
erences [34,36] which are able to improve the measure of intra-
compactness in a cluster for categorical data as opposed to con-
ventional k -modes algorithm. Lee and Pedrycz [35] generalized the
k -modes algorithm with fuzzy p-modes prototypes. The algorithms
mentioned above can be seen as the special cases of the general-
ized k -modes algorithm. However, Bai et al. [14] argued that the
before-mentioned methods can converge to a local optimum so-
lution only if these algorithms degenerate to the simple k -modes
algorithm. To overcome this deficiency, Bai et al. [14] developed
two modified k -mode type clustering algorithms: MKM_NOF and
MKM_NDM which employ the entropy and l
2
-norm regularization
to smooth the centroid representation on every feature, respec-
tively.
Generalized centroid usually has more representative than tra-
ditional centroid which is represented by a single feature value on
a dimension. However, the algorithms with generalized centroid
require more computational cost due to the larger dimensions of
the centroid. Moreover, these algorithms have no capability of fea-
ture selection for noisy data sets.
2.3. k -means type clustering on mixed data sets
Facing mixed type objects that we frequently encounter in real
world, Huang [27] proposed a more practically useful algorithm:
k -prototypes which is straightforward to integrate the classic k -
means and k -modes algorithms with a balancing parameter. Lee
and Pedrycz [35] extended k -prototypes into a fuzzy p -modes clus-
tering algorithm where more effective centroid representation is
used in the part of categorical data in comparison to classic k -
prototypes algorithm. Ahmada and Dey [37] proposed another k -
means type clustering algorithm for subspace clustering for mixed
numerical and categorical data. However, this method also inte-
grates numerical data and categorical data by simple addition. It
is lack of effective method to fuse numerical data and categorical
data under the k -means type clustering framework except simple
addition of two parts. In the existing methods mentioned above, in
essence, the numeric data and the categorical data are still tackled
separately, which is not really unified semantically.
2.4. Characteristic of our proposed methods
At present, the existing k -means type algorithms can be sum-
marized two classes: (1) no weighting k -means algorithms; (2)
weighting k -means algorithms. No weighting k -means algorithms
Please cite this article as: X. Huang et al., A new weighting k -means type clustering framework with an l
2
-norm regularization,
Knowledge-Based Systems (2018), https://doi.org/10.1016/j.knosys.2018.03.028