多视图Minkowski度量局部自适应聚类算法

97 浏览量更新于2024-08-26 收藏 1.53MB PDF 举报

"Minkowski度量的多视图协作局部自适应聚类是Guang-Yu Zhang等人在2017年发表于《Expert Systems with Applications》期刊上的一篇研究论文，探讨了如何利用Minkowski度量来实现多视图数据的协作聚类方法。文章提出了一种新的聚类算法，旨在处理不同视图下的数据，并适应局部结构的变化，以发现一致的聚类结构。关键词包括聚类、多视图聚类、子空间聚类、局部自适应聚类以及协同策略和Minkowski度量。" 这篇论文主要关注的是在处理包含多个视角或表示的数据集时的聚类问题。在现实世界的应用中，数据可能从不同的角度或特征维度被观察，形成所谓的“多视图”数据。这些不同的视图可能揭示了数据的各个方面，但潜在的聚类结构通常是相似的或者有共识的。Minkowski度量，作为一种距离度量，可以灵活地适应不同类型的变量，如连续和离散数据，它是欧几里得距离的一个扩展，包括了p范数的范围。论文提出的多视图协作局部自适应聚类算法旨在解决以下挑战：首先，如何有效地融合来自不同视图的信息，保持每个视图的独特性，同时增强聚类结果的稳定性和一致性；其次，如何适应数据的局部结构变化，因为全局的聚类方法可能无法捕捉局部的模式和特性；最后，如何利用协同策略，通过各视图之间的交互来优化聚类结果。在Minkowski度量的基础上，该算法可能会动态调整不同视图的权重，以适应数据的局部特性。它可能涉及到对每个视图的局部邻域进行分析，识别出能够代表局部结构的关键点，然后通过协同策略在各视图间传递信息，以提高聚类的准确性。此外，论文可能还讨论了算法的复杂性、收敛性以及与其他聚类方法的比较，以证明其优越性。这篇研究论文对于理解如何在复杂多视图数据环境下进行有效的聚类分析具有重要意义，特别适用于那些希望从多个角度综合理解数据的领域，如社交网络分析、图像识别、生物信息学等。通过引入Minkowski度量和局部自适应策略，它为多视图聚类提供了一个新的、可能更鲁棒的方法。

G.-Y. Zhang et al. / Expert Systems With Applications 86 (2017) 307–320 309

and the parameter α controls the number of features selected dur-

ing the clustering.

The objective function is optimized by updating the variables

, θ

and w

iteratively until convergence as follows ( Domeniconi

et al., 2007 ):

= { x |∀ x

, s.t. k = arg min

(θ

, x

) } , k = 1 , 2 , . . . , K, (2)



∈ S

| S

, k = 1 , 2 , . . . , K, j = 1 , 2 , . . . , q, (3)

exp (−A

/α)





exp (−A



/α)

, k = 1 , 2 , . . . , K, j = 1 , 2 , . . . , q, (4)

where L

(θ

, x

) = (



j=1

− θ

))

1 / 2

for each data point x

and



∈ S

− θ

)

/ | S

| .

LAC assigns each cluster a weight vector which simultaneously

constructs the distance correlation and ﬁnds the appropriate sub-

space via local feature selection so as to deal with the issue of

information loss that often occurs in global subspace clustering

techniques. LAC shows a competitive performance compared with

the existing subspace clustering algorithms ( Chen, Ye, Xu, & Huang,

2012; Gan & Ng, 2015; Yip, Cheung, & Ng, 2004 ). However, the con-

ventional LAC is designed for single-view clustering and is unable

to exploit multi-view information for data from multiple sources.

Moreover, LAC merely considers the classic squared Euclidean dis-

tance which is incapable of adapt to different application tasks.

4. Multi-view collaborative locally adaptive clustering with

Minkowski metric

In this section, we introduce the proposed MV-CoMLAC ap-

proach in detail. Speciﬁcally, the objective function of our ap-

proach is described in Section 4.1 . The optimization is described

in Section 4.2 , followed by the summary of the entire method in

Section 4.3 .

4.1. The multi-view model

Given a multi-view dataset X = { X

, ... , X

} consisting of N

data points, where X

= { x

(1)

, ... , x

(T )

} denotes the i th instance

with x

(t)

representing the t th view representation of the i th in-

stance and T being the number of views, and the dimensionality of

the t th view is G

. The goal is to combine information from multi-

ple views so as to partition instances in the t th view into K clus-

ters { S

(t)

, ... , S

(t)

} simultaneously where S

(t)

denotes the k th clus-

ter (i.e. data points belonging to the k th cluster) in the t th view.

Following the methodology of center based clustering, we aim

to ﬁnd a set of cluster assignments



U = { U

(1)

, . . . , U

(T )

} and a

set of cluster centers



 = { 

(1)

, . . . , 

(T )

} , where U

(t)

= [ u

(t)

]

N×K

with u

(t)

being the cluster assignment of the i th instance to the

k th cluster in the t th view and 

(t)

= [ θ

(t)

]

K×G

with θ

(t)

being

the j th dimension of the k th cluster center in the t th view. Ad-

ditionally, two types of weighting are introduced, namely view

weightings V = { v

(1)

, . . . , v

(T )

} and local feature weightings



W =

{ W

(1)

, . . . , W

(T )

} , where v

( t )

denotes the view weighting of the t th

view and W

(t)

= [ w

(t)

]

K×G

with w

(t)

being the weighting for the

j th dimension of the k th cluster center in the t th view. An objective

function is designed as follows,

M V −CoM LAC

(



U ,



,



W , V )



t=1

(t)





k =1

| S

(t)



i =1

(t)



j=1

(t)

| x

(t)

− θ

(t)



+ α



t=1



k =1



j=1

(t)

log w

(t)

+ β



t=1

(t)

log v

(t)

s.t.



t=1

(t)

= 1 , 0 ≤ v

(t)

≤ 1 ,



j=1

(t)

= 1 , k = 1 , . . . , K, t = 1 , . . . , T ,



k =1

(t)

= 1 , u

(t)

∈ { 0 , 1 } , i = 1 , . . . , N, t = 1 , . . . , T , (5)

where α and β are two parameters controlling the distributions

of the weights W and V , respectively, | S

(t)

| denotes the number of

instances assigned to the k th cluster S

(t)

in the t th view, and x

(t)

denotes the j th dimension in the t th view representation of the i th

instance.

In the above objective function, the ﬁrst term is used to con-

struct the underlying low-dimensional subspace in the t th view by

assigning the weight w

(t)

to the j th dimension of the k th cluster

in the t th view. The second term is the entropy of weight variables



W , with parameter α controlling the distribution of W . Larger α

will lead to more uniform distribution, i.e. more features will be

selected in each view for constructing subspaces associated with

the clusters. The third term is the entropy of weight variables V ,

with parameter β controlling the distribution of V . Similarly, larger

β will lead to more uniform distribution, i.e. the views will be

more equally considered during clustering.

The Minkowski distance between x

and θ

of dimensionality G

in the t th view is deﬁned as:

dist

(t)

, θ

) =









j=1

| x

(t)

− θ

(t)

. (6)

In this paper we calculate the distance metric by removing the

p th root of the Minkowski distance, and brieﬂy denote its j th fea-

ture space in the t th view | x

(t)

− θ

(t)

as d

(t)

ik ; j

hereafter. Without

consideration of feature weighting, the used distance metric be-

tween x

and θ

in the t th view is deﬁned as below, which is ex-

tended from squared Euclidean distance to the Minkowski distance

with power p .

dist

(t)

, θ

) =

⎛

⎝









j=1

| x

(t)

− θ

(t)

⎞

⎠

= ( dist

(t)

, θ

) )



j=1

(t)

ik ; j

. (7)

The used Minkowski distance metric is determined by the pa-

rameter p , which makes our method adaptive to different applica-

tion tasks. Two special cases are as follows. When p equals 1, it is

equal to Manhattan distance in the metric space; when p equals 2,

it is equal to the squared Euclidean distance in the metric space.

4.2. Optimization

In this section, we present an alternating optimization method

to solve the objective function in Eq. (5) , which consists of

four steps. In each step, we optimize one variable by ﬁxing the

other three variables. The detailed description of the optimization

method is provided as follows.

剩余13页未读，继续阅读

weixin_38592455

粉丝: 7
资源: 896

多视图Minkowski度量局部自适应聚类算法

非最小耦合重力模型对扰动Minkowski度量的影响研究

非度量方法：中国省份城市规模聚类分析

SPSS聚类与判别分析教程：Minkowski距离应用

信号在κ-Minkowski时空和非局部两点函数上的传播

聚类算法1

MATLAB聚类分析

TSDistExtra：几个度量标准，用于测量时间序列之间的（非）相似性以执行时间序列聚类

微分与同源方法：重新审视Minkowski、Schwarzschild和Kerr度量

多属性群决策权重自适应调整算法研究

聚类分析的距离度量方式

最新资源