KNN稀疏表示提升高维数据谱聚类性能

181 浏览量更新于2024-08-26 收藏 965KB PDF 举报

本文主要探讨了如何通过基于k-最近邻（k-NN）的稀疏表示系数来改进高维数据的谱聚类方法。随着子空间聚类技术的发展，谱聚类作为一种有效的数据降维和聚类手段，其在构建亲和图上的性能对于获得高质量的聚类至关重要。传统的稀疏表示方法可以将高维数据对象表示为其他对象的稀疏线性组合，这对于处理大量特征的数据集非常有用。然而，这种方法存在一个问题，即所有的稀疏表示系数在构建亲和矩阵时可能受到噪声的影响，从而降低聚类的稳定性和准确性。作者提出了一个创新的策略，即利用k-NN算法来指导稀疏表示系数的利用。具体步骤如下：首先，对于每一个数据对象，通过稀疏表示理论计算出其稀疏表示系数向量。然后，k-NN算法被用来识别与目标对象最接近的k个邻居。在这个过程中，作者提出一个改进的方法，即只保留这k个邻居的系数，将其保持不变，同时将其他非邻近对象的系数设置为零。这样做的目的是减少噪声对聚类过程的干扰，提高亲和矩阵的稳健性。通过这种k-NN基于的稀疏表示系数，亲和矩阵能够更好地反映数据之间的局部相似性，从而优化谱聚类的性能。实验结果表明，这种方法在六个基因表达谱（GEP）数据集上的表现优于传统的策略，证明了其在处理高维数据聚类任务中的有效性。作者强调，通过引入k-NN的局部信息，这种方法不仅提高了聚类的精度，还简化了模型，减少了参数调优的需求，使得谱聚类在实际应用中更具优势。本文提供了一种新颖的策略来增强高维数据的谱聚类性能，特别是在面对噪声和复杂性时，通过结合k-NN和稀疏表示的优势，有效地提升了聚类的稳健性和效率。这对于在生物学、图像处理、社交网络分析等领域的复杂高维数据处理具有重要的实际价值。

set, the edge set, and the afﬁnity matrix, respectively. Each vertex v

2 V represents an

object x

, and each edgeði; jÞ2E is assigned an afﬁnity weight w

which represents the

similarity between x

and x

. And the degree d

of node i is d

, the degree

matrix D is a diagonal matrix D ¼ diagðd

; d

; ...; d

Þ.

Spectral clustering algorithms are based on the graph Laplacian matrices. The

un-normalized graph Laplacian matrix is deﬁned as L ¼ D  W . It is notable that L is

symmetric and semi-positive deﬁnite, and for any vector x 2 R

Lx ¼

i;j¼1

 x



ð1Þ

The symmetric normalized graph Laplacian matrix L

sym

is deﬁned as

sym

¼ D

1

¼ I  D

1

ð2Þ

where I is the identity matrix. Similarly,

sym

x ¼

i;j

ﬃﬃﬃﬃ



ﬃﬃﬃﬃ

ð3Þ

The most popular spectral clustering is NSC algorithm [1] which is described as

Algorithm 1.

We can also understand spectral clustering from the perspective of graph cut. The

task of spectral clustering can be transformed to ﬁnd the best cuts of the graph such that

edges between different groups have low weights and edges within each group have

high weights. The simplest and most direct way to construct a partition of the graph is

to solve the mincut problem. Denote FðA; BÞ¼

i2A;j2B

and



A for the complement

Spectral Clustering of High-Dimensional Data via KNN 365

剩余11页未读，继续阅读

weixin_38696836

粉丝: 3
资源: 932

KNN稀疏表示提升高维数据谱聚类性能

稀疏表示提升近邻传播聚类算法

高维数据聚类新算法：相似性保持与特征变换的融合

多流形结构分析：主成分与聚类算法在数据处理中的应用

基于稀疏表示的近邻传播聚类算法 (2014年)

计算机研究 -基于空间加权关联的稀疏表示高光谱聚类.pdf

抽样改进加权核大数据谱聚类算法.zip

SSC_邻接矩阵_KNN邻接矩阵_谱聚类_

基于聚类层次模型的视频推荐算法

基于流形鉴别信息的特征选择及其结构化稀疏表示

基于层次聚类的算法CURE源码.zip

最新资源