三向决策融合谱聚类：提升精度与效率

132 浏览量更新于2024-08-26 收藏 303KB PDF 举报

三向谱聚类是一种新兴的机器学习方法，近年来在多模态数据挖掘、图像分析以及社交网络等领域展现出强大的性能。传统意义上的聚类通常关注将数据对象分为两组或多个组，但三向聚类则扩展到了三个或更多的维度，这对于处理具有复杂关系的数据集尤其有价值。本文主要介绍了一种结合三向决策和谱聚类的新算法。在传统的谱聚类中，谱理论被用来构建数据的图模型，通过计算图的拉普拉斯矩阵来捕捉数据的局部结构。然而，该方法通常假设数据点之间存在线性关系，对于非线性或高维数据可能效果不佳。三向决策引入了对数据的多维度分析，允许算法更精细地理解数据间的关联性。新提出的三向谱聚类算法首先重新审视了谱聚类的核心步骤，通过分析数据点与簇中心之间的相似度，得到了每个簇的一个上界估计。这个上界可以看作是理想情况下数据点应归属的最优簇边界。接下来，通过扰动分析（perturbation analysis），算法区分了核心区域（数据点与其所属簇紧密相连的部分）和上界之间的差距，这部分差异被视为特定簇的边缘区域。这种方法的关键在于有效地降低了数据的偏差（DBI，Davies-Bouldin Index），这是一种衡量聚类质量的指标，它反映了簇内相似度与簇间差异的关系。同时，提高了分类准确率（ACC，Accuracy）和平均精度（AS，Average Separation），这两者都是衡量聚类效果的重要标准。通过对UCI（University of California, Irvine）数据集的实验结果验证，新算法证明了其在减少数据不纯度和提升分类精度方面的有效性。三向决策的引入使得算法能够更好地理解和利用数据的潜在结构，从而在处理三向或多模态数据时展现出显著的优势。三向谱聚类作为一种创新的聚类技术，结合了决策理论和谱聚类的优点，为解决实际问题提供了新的解决方案。通过优化数据划分过程并关注边缘区域，该方法不仅提升了聚类的质量，也扩大了谱聚类在多元数据分析中的应用范围。未来的研究可以进一步探索如何在更多领域中实现这种高效且精确的聚类方法。

Three-Way Spectral Clustering

Hong Shi

, Qiang Liu

, and Pingxin Wang

1,2(

)

School of Computer Science, Jiangsu University of Science and Technology,

Zhenjiang 212003, China

wangpingxin@just.edu.cn

College of Mathematics and Information Science, Hebei Normal University,

Shijiazhuang 050024, China

Abstract. In recent years, three-way clustering has shown promising

performance in many diﬀerent ﬁelds. In this paper, we present a new

three-way spectral clustering by combining three-way decision and spec-

tral clustering. In the proposed algorithm, we revise the process of spec-

tral clustering and obtain an upper bound of each cluster. Perturbation

analysis is applied to separate the core region from upper bound and

the diﬀerences between upper bound and core region are regarded as the

fringe region of speciﬁc cluster. The results on UCI data sets show that

such strategy is eﬀective in reducing the value of DBI and increasing the

values of ACC and AS.

Keywords: Spectral clustering

· Three-way decision

Three-way clustering

· Three-way spectral clustering

1 Introduction

Clustering plays a key role in identifying the internal structure of data. The

purpose of clustering is divide a group of unmarked samples into diﬀerent clusters

in light of similarity, such that the clusters have high intra-class similarity and

low inter-class similarity. Cluster analysis is an unsupervised learning method,

which has been positively applied in image processing [1],websearch[2], security

assurance [3] and bioinformatics [4].

Roughly speaking, hierarchical clustering and partition clustering are the

two most commonly used clustering approaches [5]. We focus on the latter in

this paper. As we know, the k-means [6] is the most frequently used partition

clustering method. It has converged when the centroids no longer change. But

the original k-means algorithm is easy to converge to a local optimal solution and

sensitive to the initial data. With the purpose of clustering on arbitrary shape

sample space and converge to global optimal solution, the spectral clustering [7,

8] method was proposed. The idea of that is to view objects as vertices and

similarities between objects as weighted edges. It turns a clustering problem into

a graph segmentation problem that makes the weights of the edges connecting

diﬀerent clusters as low as possible and the weights of the edges within a cluster

as high as possible.

 Springer Nature Switzerland AG 2018

M. Ceci et al. (Eds.): ISMIS 2018, LNAI 11177, pp. 389–398, 2018.

https://doi.org/10.1007/978-3-030-01851-1

_37

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38647517

粉丝: 2
资源: 964

三向决策融合谱聚类：提升精度与效率

改进谱聚类源码在电动机控制系统中的应用

CbCr椭圆聚类算法与Matlab在三相光伏逆变仿真中的应用

AHP灰色定权聚类在电力变压器状态评估中的应用

基于相空间重构和模糊聚类的电动机故障诊断方法

采用CbCr椭圆聚类算法，matlab源码数据模型归一化，模态振动 有CDF三角函数曲线三维曲线图

一个好用的动态频谱分配算法处理程序基于K均值的PSO聚类算法,抑制载波型差分相位调制，已调制信号计算其普相关密度

基于matlab的三相环境微颗粒检测与滤波去噪技术

MATLAB实现三相SFT-PLL同步块的使用说明与操作指南

国民经济行业分类与国际标准行业分类（ISIC+Rev.4）的对照和匹配（供参考）.docx

网络助手工具(亲测好用)

最新资源

采用CbCr椭圆聚类算法，matlab源码数据模型归一化，模态振动有CDF三角函数曲线三维曲线图