12
Vol. 6 No.5/ Oct. 2012
to discover hidden patterns from the data. Most of
the existing work in multi-view clustering follows
the Centralized approach with extensions to existing
clustering algorithms
[1, 22, 8, 35, 2, 4, 32]
. Distributed
algorithms first cluster each view independently from
others using an appropriate single-view algorithm, and
then combine the individual clustering results to produce
D¿QDOSDUWLWLRQLQJ
[25, 16]
.
Bickel and Scheffer
[1]
proposed the General Multi-
View EM algorithm based on the co-EM algorithm and
developed a two-view multinomial EM algorithm and a
two-view spherical k-means algorithm. However, their
methods cannot guarantee to converge so they are hard
for a user to decide when to stop.
Kailing et al.
[22]
proposed a multi-view version of
'%6&$1DOJRULWKP,QWKHLUPHWKRG'%6&$1LV¿UVW
employed on each view to produce several small clusters
and a large amount of noise. Then the final clusters
are determined using union and intersection of local
neighborhoods.
De Sa
[8]
proposed a two-view spectral clustering
algorithm which assumes that the views are independent.
Their method is to cluster the data in each view so as to
minimize the disagreement between the clusters in each
view.
Zhou and Burges
[35]
developed multi-view spectral
clustering via generalizing the usual single view
normalized cut to the multi-view data. The multi-view
QRUPDOL]HGFXWLVWR¿QGDFXWZKLFKLVFORVHWRRSWLPDO
on each graph, and it can be approximately optimized via
a real-valued relaxation. The relaxation leads to vertex-
wise mixture of Markov chains associated with different
graphs.
Blaschko and Lampert
[2]
proposed a clustering algorithm
for two-view data based on kernel canonical correlation
analysis, called correlational spectral clustering. It uses
separate similarity measures for each data representation,
and allows for projection of previously unseen data that
are only observed in one representation (e.g. images but
not text).
Chaudhuri et al.
[4]
proposed a clustering algorithm which
performs clustering on lower dimensional subspace of
the multiple views of the data, projected via Canonical
Correlation Analysis (CCA). Two algorithms for mixtures
of Gaussians and mixtures of log concave distributions
were developed.
Long et al.
[25]
proposed a general model for multi-view
clustering under a distributed framework. The proposed
model introduces the concept of mapping function to
make the different patterns from different pattern spaces
comparable and hence an optimal pattern can be learned
from the multiple patterns of multiple views.
Greene and Cunningham
[16]
proposed a clustering
algorithm for multi-view data using a late integration
strategy. In their method, a matrix that contains the
partitioning of every individual view is created and then
decomposed to two matrices using matrix factorization
approach: the one showing the contribution of those
SDUWLWLRQLQJVWRWKH¿QDOPXOWLYLHZFOXVWHUVFDOOHGPHWD
clusters, and the other assigning instances to the meta-
clusters.
The current multi-view clustering methods take
both multiple views and individual variables into
consideration. However, most of them are extensions to
EM or spectral clustering so they are not scalable to large
data sets.
2.2 Variable Weighting Clustering
Variable weighting clustering has been important research
topic in cluster analysis
[15, 9, 10, 12, 26, 27, 28, 14, 19, 11, 21, 18, 3, 31, 6, 5]
.
Huang et al.
[19]
proposed the W-k-means clustering
algorithm that can automatically compute variable
weights in the k-means clustering process. W-k-means
extends the standard k-means algorithm with one
additional step to compute variable weights at each
iteration of the clustering process. The variable weight
is inversely proportional to the sum of the within-cluster
variances of the variable. As such, noise variables can be
LGHQWL¿HGDQGWKHLUDIIHFWLRQVRQWKHFOXVWHULQJUHVXOWDUH
VLJQL¿FDQWO\UHGXFHG7KHQHZDOJRULWKPZHSURSRVHLQ
this paper weights both views and individual variables
and is an extension to W-k-means.
Domeniconi et al.
[11]
have proposed the Locally Adaptive
Clustering (LAC) algorithm which assigns a weight
to each variable in each cluster. They use an iterative
algorithm to minimize its objective function. Liping et
al.
[21]
pointed out that' the objective function of LAC is
not differentiable because of a maximum function. The
convergence of the algorithm is proved by replacing the