自动两级变量加权聚类算法：TW-$(k)$-Means

157 浏览量更新于2024-09-01 收藏 1.12MB PDF 举报

"TW-$(k)$-Means: 自动化双层变量加权聚类算法，用于多视图数据" 这篇研究论文提出了一个名为TW-$(k)$-Means的自动化双层变量加权聚类算法，特别适用于处理多视图数据。在多视图数据中，不同的视图可能包含相同或相关的信息，但各自的重要性不同，同时每个视图内的变量也有其独特的贡献度。该算法的主要创新点在于同时计算各视图和单个变量的权重。在TW-$(k)$-Means算法中，为每个视图分配了一个视图权重，用于量化该视图的紧凑性，即它如何有效地将数据点聚集在一起。同时，对每个视图中的变量也赋予了变量权重，用于衡量该变量在聚类过程中的重要性。这两个权重在计算对象间距离时被纳入距离函数，以此来确定对象的所属集群。这使得算法能够根据数据的特性自适应地调整权重，从而优化聚类结果。与传统的$k$-均值算法相比，TW-$(k)$-Means在迭代过程中增加了两个额外步骤，即自动计算视图权重和变量权重。通过这样的改进，算法能够更好地处理多维度、复杂的数据结构，并能反映出不同视图和变量在聚类任务中的相对重要性。为了验证TW-$(k)$-Means算法的性能和特性，研究者使用了两个真实世界的数据集进行实验。这些实验旨在分析两种类型的权重（视图权重和变量权重）如何影响聚类效果，并对比传统$k$-均值和其他聚类方法的性能。通过这些实证研究，论文展示了TW-$(k)$-Means在处理多视图数据时的有效性和优势。 TW-$(k)$-Means算法提供了一种新的策略，可以针对多视图数据的聚类问题，自动识别和利用数据的多层次结构，提高了聚类的准确性和鲁棒性。这对于大数据分析、模式识别和信息挖掘等领域具有重要的理论和应用价值。

Vol. 6 No.5/ Oct. 2012

to discover hidden patterns from the data. Most of

the existing work in multi-view clustering follows

the Centralized approach with extensions to existing

clustering algorithms

[1, 22, 8, 35, 2, 4, 32]

. Distributed

algorithms first cluster each view independently from

others using an appropriate single-view algorithm, and

then combine the individual clustering results to produce

D¿QDOSDUWLWLRQLQJ

[25, 16]

Bickel and Scheffer

[1]

proposed the General Multi-

View EM algorithm based on the co-EM algorithm and

developed a two-view multinomial EM algorithm and a

two-view spherical k-means algorithm. However, their

methods cannot guarantee to converge so they are hard

for a user to decide when to stop.

Kailing et al.

[22]

proposed a multi-view version of

'%6&$1DOJRULWKP,QWKHLUPHWKRG'%6&$1LV¿UVW

employed on each view to produce several small clusters

and a large amount of noise. Then the final clusters

are determined using union and intersection of local

neighborhoods.

De Sa

[8]

proposed a two-view spectral clustering

algorithm which assumes that the views are independent.

Their method is to cluster the data in each view so as to

minimize the disagreement between the clusters in each

view.

Zhou and Burges

[35]

developed multi-view spectral

clustering via generalizing the usual single view

normalized cut to the multi-view data. The multi-view

QRUPDOL]HGFXWLVWR¿QGDFXWZKLFKLVFORVHWRRSWLPDO

on each graph, and it can be approximately optimized via

a real-valued relaxation. The relaxation leads to vertex-

wise mixture of Markov chains associated with different

graphs.

Blaschko and Lampert

[2]

proposed a clustering algorithm

for two-view data based on kernel canonical correlation

analysis, called correlational spectral clustering. It uses

separate similarity measures for each data representation,

and allows for projection of previously unseen data that

are only observed in one representation (e.g. images but

not text).

Chaudhuri et al.

[4]

proposed a clustering algorithm which

performs clustering on lower dimensional subspace of

the multiple views of the data, projected via Canonical

Correlation Analysis (CCA). Two algorithms for mixtures

of Gaussians and mixtures of log concave distributions

were developed.

Long et al.

[25]

proposed a general model for multi-view

clustering under a distributed framework. The proposed

model introduces the concept of mapping function to

make the different patterns from different pattern spaces

comparable and hence an optimal pattern can be learned

from the multiple patterns of multiple views.

Greene and Cunningham

[16]

proposed a clustering

algorithm for multi-view data using a late integration

strategy. In their method, a matrix that contains the

partitioning of every individual view is created and then

decomposed to two matrices using matrix factorization

approach: the one showing the contribution of those

SDUWLWLRQLQJVWRWKH¿QDOPXOWLYLHZFOXVWHUVFDOOHGPHWD

clusters, and the other assigning instances to the meta-

clusters.

The current multi-view clustering methods take

both multiple views and individual variables into

consideration. However, most of them are extensions to

EM or spectral clustering so they are not scalable to large

data sets.

2.2 Variable Weighting Clustering

Variable weighting clustering has been important research

topic in cluster analysis

[15, 9, 10, 12, 26, 27, 28, 14, 19, 11, 21, 18, 3, 31, 6, 5]

Huang et al.

[19]

proposed the W-k-means clustering

algorithm that can automatically compute variable

weights in the k-means clustering process. W-k-means

extends the standard k-means algorithm with one

additional step to compute variable weights at each

iteration of the clustering process. The variable weight

is inversely proportional to the sum of the within-cluster

variances of the variable. As such, noise variables can be

LGHQWL¿HGDQGWKHLUDIIHFWLRQVRQWKHFOXVWHULQJUHVXOWDUH

VLJQL¿FDQWO\UHGXFHG7KHQHZDOJRULWKPZHSURSRVHLQ

this paper weights both views and individual variables

and is an extension to W-k-means.

Domeniconi et al.

[11]

have proposed the Locally Adaptive

Clustering (LAC) algorithm which assigns a weight

to each variable in each cluster. They use an iterative

algorithm to minimize its objective function. Liping et

al.

[21]

pointed out that' the objective function of LAC is

not differentiable because of a maximum function. The

convergence of the algorithm is proved by replacing the

剩余13页未读，继续阅读

weixin_38745434

粉丝: 14
资源: 922

自动两级变量加权聚类算法：TW-$(k)$-Means

TW-Co-k-means：用于多视图聚类的两级加权协作k-means

An Entropy Weighting k-Means Algorithm for

Unveiling-the-ActiLife-Algorithm--Converting-Raw-Acceleration-Data-to-Activity-Count:2015年无线健康大会论文

matlab均方误差的代码-Perceptual-Weighting-Filter-Loss:语音增强DNN训练的感知加权滤波器损失

Piecewise-Cox-Mixture-Cure-Model-with-IPTW:统一的两步程序解决不平衡处理，严格的检查和聚类

Weighting IFT algorithm for of f-axis quantized kinoforms of binary objects

graphhopper-traffic-data-integration:[未维护] GraphHopper的交通数据集成示例

Weighted-KNN-Algorithm-With-Inverse-Distance-Weighting-Method-Python

weighting k-means with an l2-norm regularization

BLIZZARD-Replication-Package-ESEC-FSE2018:通过上下文感知查询重构改进基于IR的Bug本地化

最新资源