G.-Y. Zhang et al. / Expert Systems With Applications 86 (2017) 307–320 309
and the parameter α controls the number of features selected dur-
ing the clustering.
The objective function is optimized by updating the variables
S
k
, θ
kj
and w
kj
iteratively until convergence as follows ( Domeniconi
et al., 2007 ):
S
k
= { x |∀ x
i
, s.t. k = arg min
l
L
l
(θ
l
, x
i
) } , k = 1 , 2 , . . . , K, (2)
θ
kj
=
x
i
∈ S
k
x
ij
| S
k
|
, k = 1 , 2 , . . . , K, j = 1 , 2 , . . . , q, (3)
w
kj
=
exp (−A
kj
/α)
q
j
=1
exp (−A
kj
/α)
, k = 1 , 2 , . . . , K, j = 1 , 2 , . . . , q, (4)
where L
l
(θ
l
, x
i
) = (
q
j=1
w
lj
(x
ij
− θ
lj
))
1 / 2
for each data point x
i
and
A
kj
=
x
i
∈ S
k
(x
ij
− θ
kj
)
2
/ | S
k
| .
LAC assigns each cluster a weight vector which simultaneously
constructs the distance correlation and finds the appropriate sub-
space via local feature selection so as to deal with the issue of
information loss that often occurs in global subspace clustering
techniques. LAC shows a competitive performance compared with
the existing subspace clustering algorithms ( Chen, Ye, Xu, & Huang,
2012; Gan & Ng, 2015; Yip, Cheung, & Ng, 2004 ). However, the con-
ventional LAC is designed for single-view clustering and is unable
to exploit multi-view information for data from multiple sources.
Moreover, LAC merely considers the classic squared Euclidean dis-
tance which is incapable of adapt to different application tasks.
4. Multi-view collaborative locally adaptive clustering with
Minkowski metric
In this section, we introduce the proposed MV-CoMLAC ap-
proach in detail. Specifically, the objective function of our ap-
proach is described in Section 4.1 . The optimization is described
in Section 4.2 , followed by the summary of the entire method in
Section 4.3 .
4.1. The multi-view model
Given a multi-view dataset X = { X
1
, ... , X
N
} consisting of N
data points, where X
i
= { x
(1)
i
, ... , x
(T )
i
} denotes the i th instance
with x
(t)
i
representing the t th view representation of the i th in-
stance and T being the number of views, and the dimensionality of
the t th view is G
t
. The goal is to combine information from multi-
ple views so as to partition instances in the t th view into K clus-
ters { S
(t)
1
, ... , S
(t)
K
} simultaneously where S
(t)
k
denotes the k th clus-
ter (i.e. data points belonging to the k th cluster) in the t th view.
Following the methodology of center based clustering, we aim
to find a set of cluster assignments
U = { U
(1)
, . . . , U
(T )
} and a
set of cluster centers
= {
(1)
, . . . ,
(T )
} , where U
(t)
= [ u
(t)
ik
]
N×K
with u
(t)
ik
being the cluster assignment of the i th instance to the
k th cluster in the t th view and
(t)
= [ θ
(t)
kj
]
K×G
t
with θ
(t)
kj
being
the j th dimension of the k th cluster center in the t th view. Ad-
ditionally, two types of weighting are introduced, namely view
weightings V = { v
(1)
, . . . , v
(T )
} and local feature weightings
W =
{ W
(1)
, . . . , W
(T )
} , where v
( t )
denotes the view weighting of the t th
view and W
(t)
= [ w
(t)
kj
]
K×G
t
with w
(t)
kj
being the weighting for the
j th dimension of the k th cluster center in the t th view. An objective
function is designed as follows,
J
M V −CoM LAC
(
U ,
,
W , V )
=
T
t=1
v
(t)
K
k =1
1
| S
(t)
k
|
N
i =1
u
(t)
ik
G
t
j=1
w
(t)
kj
| x
(t)
ij
− θ
(t)
kj
|
p
+ α
T
t=1
K
k =1
G
t
j=1
w
(t)
kj
log w
(t)
kj
+ β
T
t=1
v
(t)
log v
(t)
s.t.
T
t=1
v
(t)
= 1 , 0 ≤ v
(t)
≤ 1 ,
G
t
j=1
w
(t)
kj
= 1 , k = 1 , . . . , K, t = 1 , . . . , T ,
K
k =1
u
(t)
ik
= 1 , u
(t)
ik
∈ { 0 , 1 } , i = 1 , . . . , N, t = 1 , . . . , T , (5)
where α and β are two parameters controlling the distributions
of the weights W and V , respectively, | S
(t)
k
| denotes the number of
instances assigned to the k th cluster S
(t)
k
in the t th view, and x
(t)
ij
denotes the j th dimension in the t th view representation of the i th
instance.
In the above objective function, the first term is used to con-
struct the underlying low-dimensional subspace in the t th view by
assigning the weight w
(t)
kj
to the j th dimension of the k th cluster
in the t th view. The second term is the entropy of weight variables
W , with parameter α controlling the distribution of W . Larger α
will lead to more uniform distribution, i.e. more features will be
selected in each view for constructing subspaces associated with
the clusters. The third term is the entropy of weight variables V ,
with parameter β controlling the distribution of V . Similarly, larger
β will lead to more uniform distribution, i.e. the views will be
more equally considered during clustering.
The Minkowski distance between x
i
and θ
k
of dimensionality G
t
in the t th view is defined as:
dist
(t)
(x
i
, θ
k
) =
p
G
t
j=1
| x
(t)
ij
− θ
(t)
kj
|
p
. (6)
In this paper we calculate the distance metric by removing the
p th root of the Minkowski distance, and briefly denote its j th fea-
ture space in the t th view | x
(t)
ij
− θ
(t)
kj
|
p
as d
(t)
ik ; j
hereafter. Without
consideration of feature weighting, the used distance metric be-
tween x
i
and θ
k
in the t th view is defined as below, which is ex-
tended from squared Euclidean distance to the Minkowski distance
with power p .
dist
p
(t)
(x
i
, θ
k
) =
⎛
⎝
p
G
t
j=1
| x
(t)
ij
− θ
(t)
kj
|
p
⎞
⎠
p
,
= ( dist
(t)
(x
i
, θ
k
) )
p
=
G
t
j=1
d
(t)
ik ; j
. (7)
The used Minkowski distance metric is determined by the pa-
rameter p , which makes our method adaptive to different applica-
tion tasks. Two special cases are as follows. When p equals 1, it is
equal to Manhattan distance in the metric space; when p equals 2,
it is equal to the squared Euclidean distance in the metric space.
4.2. Optimization
In this section, we present an alternating optimization method
to solve the objective function in Eq. (5) , which consists of
four steps. In each step, we optimize one variable by fixing the
other three variables. The detailed description of the optimization
method is provided as follows.