778 Y. Zhang et al. / Knowledge-Based Systems 163 (2019) 776–786
features. Suppose X
2
, X
3
, X
4
, X
5
are four samples from the locally
neighborhood of X
1
, then X
1
can be represented as follows:
X
1
= W
2
X
2
+ W
3
X
3
+ W
4
X
4
+ W
5
X
5
(1)
where W = {W
1
, W
2
, . . . , W
n
} is the weight matrix and the
dimension of the matrix is same as that of object matrix X. The
ideology of LLE is that this locally linear structure in the locally
neighborhood does not change after dimension reduction. There-
fore, we obtain the relationship after data mapping and dimension
reduction as follows:
X
′
1
= W
2
X
′
2
+ W
3
X
′
3
+ W
4
X
′
4
+ W
5
X
′
5
(2)
where X
′
= {X
′
1
, X
′
2
, . . . , X
′
n
} is the object matrix after mapping. As
we know, the linear structure only affects the relationship nearby
the current sample and it does not have an influence on the objects
staying away from the sample. The objective function of LLE is
partitioned to two parts:
J(W ) =
n
i=1
∥X
i
−
k
j=1
W
j
X
j
∥
2
2
(3)
J(X
′
) =
n
i=1
∥X
′
i
−
k
j=1
W
j
X
′
j
∥
2
2
(4)
Eq. (3) is to obtain the weight matrix W of each object to learn
the relationship between the objects and their neighborhoods.
And Eq. (4) exploits the matrix from Eq. (3) to get the results of
dimension reduction, a new object matrix X
′
.
2.2. Laplacian Eigenmaps (LE)
Laplacian Eigenmaps (LE) is similar to LLE, and it also utilizes the
locality to construct the relationship between objects and preserve
the manifold structure of data [40]. The main idea of LE is that if
object i and object j are similar in original space, then they will be
very similar after dimension reduction. Assume that there are n ob-
jects with m
1
attributes, X = {X
1
, X
2
, . . . , X
n
} ∈ R
n∗m
1
is the object
matrix, and Y = {Y
1
, Y
2
, . . . , Y
i
, . . . , Y
n
} ∈ R
n∗m
2
is the matrix with
m
2
attributes after dimension reduction, i = 1, 2, . . . , n, m
1
>
m
2
, W
i
= {W
i,1
, W
i,2
, . . . , W
i,j
, . . . , W
i,k
} represents the adjacent
matrix of the ith object with k samples, W
i,j
is the jth entry of
W
i
. LE consists of three steps, including graph construction, weight
decision and eigenmapping.
The first step, graph construction, is achieved by some simple
methods, such as k-nearest-neighbors (KNN) algorithms. Secondly,
LE decides the weight between objects by exploiting the con-
structed relationship from the first step and using fundamental
functions, like thermonuclear function, to get the adjacent matri-
ces. Finally, it computes the eigenvalues and eigenvectors of Lapla-
cian matrix L by utilizing the adjacent matrices from the second
step to achieve eigenmapping, where Laplacian matrix L
i
= D
i
−W
i
and D
i
=
j
W
ij
is the diagonal matrix, W = {W
1
, W
2
, . . . , W
n
}
contains the adjacent matrix of each object.
In summary, the objective function of LE is shown as follows:
J(Y ) = min
i,j
∥Y
i
− Y
j
∥
2
W
i,j
(5)
The difference between LE and LLE is that LLE uses the neighbors
to reconstruct a test sample and it transforms original data from
high dimensional space into separable low dimensional space,
while LE considers keeping the similar distance from origin space
to a new feature space and it makes related samples to be as close
as possible in reduced space. They both preserve the locality and
achieve dimension reduction effectively, and play a vital role on
clustering or classifying high-dimension data. However, neither of
them is able to achieve two times of dimension reduction, because
the weight matrix or adjacent matrix is not changed and it makes
no sense on the second time of dimension reduction. Therefore,
we combine the two algorithms to solve multi-task multi-view
clustering.
It is necessary for the two dimension because of the complexity
of original data. For multi-task multi-view problems, especially
in heterogeneous situation, various tasks and views may have
different structures. We explore the first dimension reduction to
transform the original data from multiple views into the common
intermediate space (view space) and obtain the more separable
data structures, and LLE is suitable for this step because it keeps
the locally linear structures and makes the data be separable si-
multaneously; The second dimension reduction is to transform
the data from the intermediate space (view space) into task space
and extract the shared and complementary features among various
tasks. We want to obtain the samples that are relate to each other
to be close such that sufficient shared and complementary features
are learned, and LE is more appropriate for this step.
3. Methodology
For multi-task multi-view clustering, we usually consider that
there is similar relationship between several views of each task.
However, it is difficult to directly learn knowledge through the
original data. In order to facilitate information extracting, we col-
lectively learn the feature transformation matrices for all views
from each task by LLE. Besides, some similar features exist in
multiple tasks, we want to keep the distances between tasks in
the process of mapping and solve the multi-task problems by LE.
In this section, we introduce the detail of our proposed method,
L3E-M2VC.
As previous mentioned, LLE and LE are for learning the struc-
tures in views of each task and multiple tasks, respectively. There-
fore, the process of multi-task multi-view clustering for knowledge
sharing is partitioned to two steps. First step is mapping the views
of each task to a common space called view space. Second is
extracting the features in multiple tasks and mapping them to a
discriminant space called task space, which is shown in Fig. 1.
Firstly, through the first transformation step, the data samples
of the vth view from the tth task in original space is transformed
into the view space, which just makes samples be more separable
and each transformation matrix is dependent on various tasks and
views. The view space can be seemed as a common intermediate
latent space that is shared by all the views from each task [38].
Then through the transformation in the second step, the sample
is mapped from view space to task space. Assume the second
transformation matrix is R
t
, which consists of two parts, including
the shared feature R which is common in multiple tasks, and the
complementary feature Rc
t
that belongs to only one task. Of course,
the second transformation step for the different views in the same
task is identical, because they are mapped to the same space in
the first step. In other words, the view space offers a common
plane for various views in each task which uniforms the features
in different views; then the task space considers the shared and
complementary features in multiple tasks in order to facilitate
knowledge sharing in multi-task multi-view clustering.
3.1. Problem definition
Assume that there are T clustering tasks with V
t
views, and
X
v
t
= {x
v
t,1
, x
v
t,2
, . . . , x
v
t,i
, . . . , x
v
t,n
t
} ∈ R
n
t
∗d
v
t
, i = 1, 2, . . . , n
t
, t =
1, 2, . . . , T , and v = 1, 2, . . . , V
t
, where n
t
represents the number
of objects in tth task, and d
v
t
is the dimension of feature of the vth
view in the tth task. Obviously, each task is clustered to c
t
classes,
and it is different for various tasks about the label sets because
of the heterogeneity. The structures of data in view space Y
v
t
and