流形上排序定义的光谱聚类亲和图

72 浏览量更新于2024-08-27 收藏 638KB PDF 举报

"这篇文章主要探讨了在频谱聚类中如何通过流形上的排序来定义亲和图，以此改进现有算法的局限性。作者提出了平滑一致性和约束一致性这两个概念，用于保持亲和图的性质，并在流形的排序正则化框架下构建新的亲和图定义。该方法不仅适用于无监督聚类，还能应用于半监督场景。实验结果显示，这种方法在处理合成数据和真实世界数据时表现出良好的效果。" 在传统的频谱聚类中，数据集被转化为亲和图，然后寻找最优的图划分进行聚类。然而，通常使用的高斯函数作为亲和度计算方式存在局限性，因为它难以体现数据的内在结构，并且需要手动选择合适的缩放参数，这是一个尚未解决的问题。针对这一问题，文章提出了一个新的策略，即通过在流形上的排序来定义亲和图。首先，文章引入了平滑一致性（Smoothness Consistency）的概念，旨在确保相邻数据点之间的亲和度较高，从而更好地捕捉数据的局部结构。其次，约束一致性（Constraint Consistency）则是为了保持亲和图的一致性，确保聚类结果与数据的固有属性相符。这两种一致性原则结合在一起，为构建新的亲和图提供了理论基础。接下来，作者在流形上的排序正则化框架内定义了新的亲和图。这个框架允许对数据点进行排序，从而揭示它们之间的相对关系，同时考虑到了数据的内在几何结构。通过这种方式，新定义的亲和图能够更好地反映数据的固有结构，而不依赖于特定的缩放参数。该方法不仅适用于无监督聚类，还能够扩展到半监督场景，这意味着即使在部分标签的情况下，也能有效地进行聚类。实验部分，作者使用合成数据和真实世界的数据集验证了新方法的有效性，结果表明，与传统方法相比，该方法在聚类性能上有所提升。这篇文章为频谱聚类提供了一种新的视角，通过在流形上定义和优化亲和图，解决了高斯函数的局限性，并且提高了聚类的准确性和鲁棒性。这项工作对于理解和改进聚类算法，特别是频谱聚类方法，具有重要的理论和实践意义。

On deﬁning afﬁnity graph for spectral clustering through

ranking on manifolds

Tian Xia

a,b,



, Juan Cao

, Yong-dong Zhang

, Jin-tao Li

Center for Advanced Computing Technology Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Graduate University of the Chinese Academy of Sciences, Beijing 100039, China

article info

Article history:

Received 3 November 20 08

Received in revised form

20 January 2009

Accepted 1 March 2009

Communicated by D. Tao

Available online 9 April 2009

Keywords:

Afﬁnity graph

Spectral clustering

Ranking on manifolds

abstract

Spectral clustering consists of two distinct stages: (a) construct an afﬁnity graph from the dataset and

(b) cluster the data points through ﬁnding an optimal partition of the afﬁnity graph. The focus of the

paper is the ﬁrst step. Existing spectral clustering algorithms adopt Gaussian function to deﬁne the

afﬁnity graph since it is easy to impleme nt. However, Gaussian function is hard to depict the intrinsic

structure of the data, and it has to specify a scaling parameter whose selection is still an open issue in

spectral clustering. Therefore, we propose a new deﬁnition of afﬁnity graph for spectral clustering from

the graph partition perspective. In particular, we propose two consistencies: smooth consistency and

constraint consistency, for afﬁnity graph to hold, and then deﬁne the afﬁnity graph respecting these

consistencies in a regularization framework of ranking on manifolds. Meanwhile the proposed

deﬁnition of afﬁnity graph is applicable to both unsupervised and semi-supervised spectral clustering.

Encouraging experimental results on synthetic and real world data demonstrate the effectiveness of the

proposed approach.

1. Introduction

In recent years, spectral clustering has become one of the most

popular modern clustering algorithms. It clusters data points

through performing spectral analysis on the matrix derived from

the data. Speciﬁcally, spectral clustering consists of two distinct

stages: (a) construct an afﬁnity graph from the dataset and (b)

cluster the data points through ﬁnding an optimal partition of the

afﬁnity graph. Although a great deal of effort has been carried out

addressing the latter, little progress has been made on deﬁning

afﬁnity graph, whereas it encodes the intrinsic structure of the

data, and plays an important role in spectral clustering.

Existing spectral clustering algorithms [2–5] adopt Gaussian

function, i.e., Aðx

; x

Þ¼expðkx

 x

Þ, to deﬁne the afﬁnity

graph, since it is simple to implement. However, Gaussian

function has a scaling parameter

to be speciﬁed manually, and

its selection is still an open issue in spectral clustering. In practice,

is often set by an empirical value, such as

is set as 0.05 of

the maximal pairwise Euclidean distance among the dataset in

normalized cut algorithm (NC) which is a representative spectral

clustering algorithm [2,6]. This setting of

makes spectral

clustering be very sensitive to outliers. Manor et al. propose to

use a local scale rather than a global one for Gaussian function [7];

however, this algorithm has limited success on real world data

although it works well on synthetic data.

More important, Gaussian function is hard to depict the

intrinsic structure of the data from graph partition perspective,

and makes spectral clustering perform badly on some data

distributed in complex shape. To illustrate this, let us consider a

toy example on two-moon data as shown in Fig. 1(a). The afﬁnity

graph constructed in NC is shown in Fig. 1(b), in the form of

K-nearest neighborhood (KNN) graph. We can see that some data

pairs distributed on separate moons are also linked in the afﬁnity

graph; it implies some wrong local neighborhood relationships,

and thus the clustering result of NC is somehow biased as shown

in Fig. 1(c). We also illustrate the example in semi-supervised

case: four constraints including two must-link constraints and

two cannot-link constraints are added as shown in Fig. 1(d); the

afﬁnity graph as shown in Fig. 1(e) is constructed through a

representative distance metric learning method proposed in [8];

we can see that some very near neighbors are not linked in the

afﬁnity graph, and thus it produces bad clustering result as shown

in Fig. 1(f). From this toy example, we can observe: (a) the success

of spectral clustering depends greatly on the constructed afﬁnity

graph which encodes the intrinsic structure of the data; and (b)

Gaussian function based afﬁnity graph still has problems to depict

the intrinsic structure of some data distributed in complex shape.

In this paper, we focus on the deﬁnition of afﬁnity graph for

spectral clustering in both unsupervised and semi-supervised

ARTICLE IN PRESS

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

doi:10.1016/j.neucom.2009.03.012



Corresponding author at: Center for Advanced Computing Technology

Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing

100190, China.

E-mail address: txia@ict.ac.cn (T. Xia).

Neurocomputing 72 (2009) 3203–3211

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38712899

粉丝: 4

流形上排序定义的光谱聚类亲和图

Affinity Graph Supervision for Visual Recognition.pdf

Dense_Neighborhoods_on_Affinity_Graph.pdf

流形上的空间密度聚类算法研究* (2007年)

基于流形距离的量子进化聚类算法

基于流形距离的生产状态聚类分析 (2011年)

论文研究-基于流形距离的生产状态聚类分析.pdf

使用一维流形嵌入和基于光谱空间的亲和度度量的高光谱图像分类

双核范数流形学习在高维数据聚类的应用探索

流形结构改进的多聚类MEAP算法：增强非结构化数据聚类性能

流形学习在高维数据聚类与可视化中的应用

最新资源