加权视图K均值聚类：提升多视图数据处理效果

180 浏览量更新于2024-08-14 收藏 184KB PDF 举报

“加权视图多视图K均值聚类是针对数据多视图特性的一种聚类方法，旨在利用不同视图中的互补信息进行更有效的聚类。该方法由Hong Yu等人提出，通过考虑各视图的重要性差异，避免盲目组合不同视图的信息导致聚类效果下降。同时，为了降低异常值的影响，该方法采用了l2,1范数来计算数据点与聚类中心之间的距离。通过交替迭代更新策略，寻找最优解。实验表明，这种方法在真实世界数据集上的性能优于其他方法。” 加权视图多视图K均值聚类是一种处理多源数据的机器学习技术，特别适用于那些可以从多个角度或特征进行描述的数据集。在现实生活中，许多实例如社交媒体用户、图像或文本，都可以用多种方式（即多个视图）来表示。这些不同的视图提供了对同一实例的多维度理解，但每个视图的可靠性、重要性和信息含量可能有所不同。传统的K均值聚类算法简单地将所有特征合并在一起，可能忽视了不同视图间的差异，从而降低了聚类质量。为了克服这个问题，加权视图多视图K均值方法引入了一个权重机制，允许根据各个视图的贡献度来调整它们在聚类过程中的影响。这种权重的确定通常基于数据的统计特性、视图的相关性或者预定义的领域知识。异常值在数据集中常常存在，它们可能会扭曲聚类结果。为了解决这个问题，论文采用了l2,1范数，这是一种对角占优矩阵的稀疏表示，可以有效地检测并抑制异常值的影响。相比于常用的欧几里得距离（l2范数），l2,1范数在计算距离时更倾向于忽略异常值，使得聚类结果更为稳健。算法的核心是交替迭代更新策略，这是一种优化方法，通过反复迭代更新数据点的分配和聚类中心的位置，直到达到某种收敛条件。在每一轮迭代中，首先根据当前的聚类中心计算每个数据点到各聚类中心的l2,1距离，然后根据这些距离和视图权重重新分配数据点，接着更新聚类中心。这个过程不断重复，直到聚类分配不再显著改变或达到预设的最大迭代次数。实验部分，作者对比了提出的加权视图多视图K均值方法与其他多视图聚类算法在多个真实世界数据集上的表现。结果显示，提出的算法在保持聚类结构的准确性、鲁棒性和稳定性方面具有优势，验证了其有效性和适用性。关键词：多视图聚类、l2,1范数、加权、K均值。这四个关键词概括了该研究的主要内容，即利用l2,1范数处理异常值，通过加权机制整合多视图信息，并基于K均值框架进行聚类。

View-Weighted Multi-view K-means Clustering

Hong Yu

(

)

, Yahong Lian, Shu Li, and JiaXin Chen

School of Software, Dalian University of Technology, Dalian, China

hongyu@dlut.edu.cn, lianyahong1@163.com, ann

ssdut@163.com,

jiaxin

chen@163.com

Abstract. In many clustering problems, there are dozens of data which

are represented by multiple views. Diﬀerent views describe diﬀerent

aspects of the same set of instances and provide complementary infor-

mation. Considering blindly combining the information from diﬀerent

views will degrade the multi-view clustering result, this paper proposes

a novel view-weighted multi-view k-means method. Meanwhile, to reduce

the adverse eﬀect of outliers, l

2,1

norm is employed to calculate the dis-

tance between data points and cluster centroids. An alternative itera-

tive update schema is developed to ﬁnd the optimal value. Comparative

experiments on real world datasets reveal that the proposed method has

better performance.

Keywords: Multi-view clustering

· l

2,1

norm · Weighting · k-means

1 Introduction

In our daily life, more and more instances have representations in the form of

multiple views [3,13]. Typical examples include web pages, which can be repre-

sented by two main attribute sets. One is page contents, another is anchor texts

of inbound hyperlink. The appearance of such data has induced the clustering

of technique called multi-view clustering [1].

The traditional clustering methods note as single view clustering just utilize

one of the feature sets to learn. The goal of multi-view clustering is to take

advantage of information from all views so that it can obtain more stable and

accurate clustering result than single-view clustering. Recent years, the research

on multi-view clustering has attracted a lot of attention [10–12].

Kumar and Daum´e[9] presented a co-training based multi-view spectral

clustering method. It uses the spectral embedding from one view to constrain

the similarity graph used for the other view. Xia et al. [15] proposed a robust

Markov chain based multi-view spectral clustering method which has low-rank

and sparse constraint. In the study [4], authors delivered a multi-view normal-

ized cut approach which fuses the spectral clustering with local search procedure.

The common problem of the above mentioned methods is that they lose sight

of discriminating views from one another. As a result, some views that contain

noise may degrade the clustering result.

 Springer International Publishing AG 2017

A. Lintas et al. (Eds.): ICANN 2017, Part II, LNCS 10614, pp. 305–312, 2017.

https://doi.org/10.1007/978-3-319-68612-7

_35

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38516190

粉丝: 8
资源: 896

加权视图K均值聚类：提升多视图数据处理效果

混合粒子群算法的双层加权多视图聚类

awesome-multi-view-clustering:先进，新颖的多视图聚类方法（论文，代码和数据集）的集合

K-means聚类实现C++版本

多视图聚类的基于内簇权重的核K-means方法

多视图聚类算法研究：SwMC-IJCAI17代码解析

自动两级变量加权聚类算法：TW-$(k)$-Means

Django实现的模糊K均值算法介绍

多空间FCM算法技术资料压缩包详细介绍

MATLAB聚类结果可视化：绘制完美聚类图的【终极技巧】

【可视化分析】：mclust包聚类结果的呈现艺术

最新资源