G.-Y. Zhang et al. / Knowledge-Based Systems 150 (2018) 127–138 129
W eighted Co llaborative k -means (TW-Co- k -means), which utilizes
a collaborative manner to exploit the shared information between
different views while considering the diversity in each view. Addi-
tionally, the proposed approach is able to weight the views and the
features in each view simultaneously according to the importance
of the views and the features, which leads to satisfactory clustering
results as demonstrated by the experimental evaluation.
3. The proposed approach
In this section, we introduce the proposed TW-Co- k -means
approach. First, the objective function of TW-Co- k -means is de-
scribed in Section 3.1 . Then, we propose an alternating optimiza-
tion method to solve this problem in Section 3.2 . Finally, we sum-
marize the TW-Co- k -means algorithm in Section 3.3 .
3.1. The objective function
Given a dataset X = { X
1
, . . . , X
N
} with N samples and T views,
where X
i
= { x
(1)
i
, . . . , x
(T )
i
} denotes the i -th sample in the dataset
with x
(t)
i
representing the t -th view element of the i -th sample. In
the t -th view, the dimensionality (the number of features) is de-
noted as G
( t )
.
Let M = { M
(1)
, . . . , M
(T )
} denote a set of cluster centers, where
M
(t)
= { m
(t)
1
, ... , m
(t)
K
} denotes the t -th element of M with m
(t)
k
representing the k -th cluster center in the t -th view. Let U =
{ U
(1)
, . . . , U
(T )
} denote a set of cluster assignments, where U
( t )
de-
notes the t -th element of U and the ( i, k )-th entry of U
( t )
, referred
to as u
(t)
ik
, indicates whether the i -th sample belongs to the k -th
cluster in the t -th view or not. Let d
(t)
i,k ; j
be the Euclidean distance
that measures the dissimilarity on the j -th feature between the i -th
sample and the k -th cluster center in view t , which can be com-
puted as d
(t)
i,k ; j
= x
(t)
ij
− m
(t)
kj
2
. Let W = { W
(1)
, . . . , W
(T )
} be a set
of feature weights, where W
(t)
= { w
(t)
1
, . . . , w
(t)
G
(t)
} with w
(t)
j
repre-
senting the j -th feature weight in the t -th view. The value of w
(t)
j
reflects the importance of the j -th feature in the t -th view. Let
V = { v
(1)
, . . . , v
(T )
} denote a set of view weights for T views, where
the value of v
( t )
reflects the importance of the t -th view.
The goal of the proposed approach is to exploit a collaborative
strategy to group the dataset X into K clusters, by taking into ac-
count the view weighting, the feature weighting and the mutual
links between views simultaneously. The objective function of the
proposed TW-Co- k -means approach is defined as follows,
J(M , U , W , V ) =
T
t=1
v
(t)
j∈ G
(t)
w
(t)
j
K
k =1
N
i =1
u
(t)
ik
d
(t)
i,k ; j
+
η
(T − 1)
+ α
T
t=1
j∈ G
(t)
w
(t)
j
log w
(t)
j
+ β
T
t=1
v
(t)
log v
(t)
(1)
subject to
T
t=1
v
(t)
= 1 , 0 ≤ v
(t)
≤ 1 ,
j∈ G
(t)
w
(t)
j
= 1 , t = 1 , . . . , T ,
K
k =1
u
(t)
ik
= 1 , u
(t)
ik
∈ { 0 , 1 } , i = 1 , . . . , N, t = 1 , . . . , T ,
where is a penalty term defined as
=
T
t=1
t
= t
N
i =1
K
k =1
| u
(t
)
ik
− u
(t)
ik
|
v
(t)
j∈ G
(t)
w
(t)
j
d
(t)
i,k ; j
− v
(t
)
j∈ G
(t
)
w
(t
)
j
d
(t
)
i,k ; j
. (2)
Two parameters, i.e., α and β, are introduced to control the distri-
bution of weighting variables W and V, respectively. The parameter
η is introduced to control the effect of the penalty term .
The objective function in Eq. (1) consists of three parts. The
first part computes the sum of within-cluster distances in each
view by assigning weights to the views and the features (i.e. the
two-level weighting strategy with level one corresponding to the
view weighting strategy and level two corresponding to the fea-
ture weighting strategy). The second part is a penalty term that
measures the disagreement across multiple views in a collabora-
tive manner. The third part consists of two entropy-based terms,
which adjust the influence of the weighting variables W and V in
the objective function, respectively. In the literature, collaborative
clustering was first proposed to solve the clustering problem which
consists of several separate subsets [18–21] . In general, these sub-
sets have common information. The collaborative manner can ef-
fectively discover their shared structure and the complementary
information can be integrated from the other subsets during the it-
eration. Inspired by the collaborative clustering, our approach also
utilizes the collaborative manner for multi-view clustering.
Different from the previous collaborative multi-view clustering
approach [9,15] , the penalty term in our objective function can ex-
ploit the different importance of the multiple views and the fea-
tures in each view so as to construct a more satisfactory cluster-
ing result. In particular, if the cluster assignment in the t -th view
is different from that in the t
-th view, i.e., u
(t)
ik
= u
(t
)
ik
, then the
disagreement is made to penalize the objective function. It is ob-
vious that if the cluster assignments in different views are simi-
lar to each other, the disagreement term will tend to be smaller.
Besides,
v
(t)
j∈ G
(t)
w
(t)
j
d
(t)
i,k ; j
− v
(t
)
j∈ G
(t
)
w
(t
)
j
d
(t
)
i,k ; j
is a term corre-
sponding to | u
(t)
ik
− u
(t
)
ik
| as the Euclidean distance between the lo-
cal view assignments u
(t)
ik
and u
(t
)
ik
. The larger the parameter η is,
the greater the influence of the penalty term on the objective
function is.
3.2. Optimization
In this section, the objective function in Eq. (1) is solved
by using the alternating optimization method, which consists of
four steps. In each step, we optimize one variable by fixing the
other three variables. The detailed description of the optimization
method is provided as follows.
3.2.1. Update the cluster assignments
In this step, we update the cluster assignment u
(t)
ik
by fixing
variables M , W, V in the t -th view.
Like the conventional k -means-like clustering approaches
[16,22] , it is not feasible to directly take the derivative of
Eq. (1) w.r.t. u
(t)
ik
. Hence, we adopt a new update rule which is
different from the update rule of the conventional k -means-like
approaches [16,22,23] . Furthermore, for different application tasks,
multi-view data may be embedded in different manifold struc-
tures. The existing works assumed that the multiple feature rep-
resentations of the multi-view data have strong and compact rela-
tionship so the cluster assignment is always updated in a concate-
nation manner. However, for many real-world multi-view datasets,