两级加权协作k-means：TW-Co-k-means在多视图聚类中的应用

研究论文

需积分: 9 56 浏览量更新于2024-08-26 收藏 1.28MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"TW-Co-k-means：用于多视图聚类的两级加权协作k-means，该方法旨在解决多视图聚类中的观点一致性与多样性平衡问题以及不同视图的权重分配问题。" 在多视图聚类领域，由于数据可以从多个来源或视角（视图）获取，因此分析这些多维度的信息对于理解和挖掘数据的潜在结构至关重要。TW-Co-k-means是一种创新的聚类算法，特别设计用于处理这种情况。这篇研究论文发表在《知识基系统》（Knowledge-Based Systems）杂志上，作者包括Guang-Yu Zhang、Chang-Dong Wang、Dong Huang、Wei-Shi Zheng和Yu-Ren Zhou，他们分别来自中山大学的数据与计算机科学学院和华南农业大学的数学与信息科学学院。多视图聚类面临的挑战主要包括两个方面：首先，如何在保持各个视图内部多样性的前提下，确保不同视图之间的聚类结果具有一致性。这意味着尽管每个视图可能揭示了数据的不同方面，但最终的聚类应该在所有视图中都能找到共同的模式。其次，如何合理地给每个视图分配权重，因为不同的视图可能具有不同的重要性和可靠性，恰当的权重分配可以提高聚类的准确性和稳定性。 TW-Co-k-means算法提出了一种两级加权协作策略来解决这些问题。第一级聚类是协同k-means过程，它通过协作学习来寻找跨视图的一致性。在此阶段，算法尝试在保持各视图独立性的同时，使它们的聚类结果趋近一致。第二级则引入了一个加权机制，根据各个视图的性能和可靠性动态调整它们的权重，从而优化整体的聚类效果。具体来说，算法首先对每个视图分别执行标准的k-means聚类，然后计算不同视图聚类结果的相似度，以确定它们的一致性。接着，基于这些一致性度量，算法会更新视图的权重，更倾向于那些与其它视图结果高度一致的视图。这个过程可能会迭代进行，直到权重稳定或达到预设的停止条件。关键词包括聚类、多视图、协同k-means、加权，表明该研究专注于改进传统的k-means算法以适应多视图环境，并引入了权重调整机制来提升聚类的准确性和鲁棒性。TW-Co-k-means算法为多视图聚类提供了一种新的解决方案，有助于在处理复杂数据集时更好地捕捉数据的真实结构。

资源详情

资源推荐

G.-Y. Zhang et al. / Knowledge-Based Systems 150 (2018) 127–138 129

W eighted Co llaborative k -means (TW-Co- k -means), which utilizes

a collaborative manner to exploit the shared information between

different views while considering the diversity in each view. Addi-

tionally, the proposed approach is able to weight the views and the

features in each view simultaneously according to the importance

of the views and the features, which leads to satisfactory clustering

results as demonstrated by the experimental evaluation.

3. The proposed approach

In this section, we introduce the proposed TW-Co- k -means

approach. First, the objective function of TW-Co- k -means is de-

scribed in Section 3.1 . Then, we propose an alternating optimiza-

tion method to solve this problem in Section 3.2 . Finally, we sum-

marize the TW-Co- k -means algorithm in Section 3.3 .

3.1. The objective function

Given a dataset X = { X

, . . . , X

} with N samples and T views,

where X

= { x

(1)

, . . . , x

(T )

} denotes the i -th sample in the dataset

with x

(t)

representing the t -th view element of the i -th sample. In

the t -th view, the dimensionality (the number of features) is de-

noted as G

( t )

Let M = { M

(1)

, . . . , M

(T )

} denote a set of cluster centers, where

(t)

= { m

(t)

, ... , m

(t)

} denotes the t -th element of M with m

(t)

representing the k -th cluster center in the t -th view. Let U =

{ U

(1)

, . . . , U

(T )

} denote a set of cluster assignments, where U

( t )

de-

notes the t -th element of U and the ( i, k )-th entry of U

( t )

, referred

to as u

(t)

, indicates whether the i -th sample belongs to the k -th

cluster in the t -th view or not. Let d

(t)

i,k ; j

be the Euclidean distance

that measures the dissimilarity on the j -th feature between the i -th

sample and the k -th cluster center in view t , which can be com-

puted as d

(t)

i,k ; j

=  x

(t)

− m

(t)



. Let W = { W

(1)

, . . . , W

(T )

} be a set

of feature weights, where W

(t)

= { w

(t)

, . . . , w

(t)

} with w

(t)

repre-

senting the j -th feature weight in the t -th view. The value of w

(t)

reﬂects the importance of the j -th feature in the t -th view. Let

V = { v

(1)

, . . . , v

(T )

} denote a set of view weights for T views, where

the value of v

( t )

reﬂects the importance of the t -th view.

The goal of the proposed approach is to exploit a collaborative

strategy to group the dataset X into K clusters, by taking into ac-

count the view weighting, the feature weighting and the mutual

links between views simultaneously. The objective function of the

proposed TW-Co- k -means approach is deﬁned as follows,

J(M , U , W , V ) =



t=1

(t)





j∈ G

(t)



k =1



i =1

(t)

i,k ; j



(T − 1)

 + α



t=1



j∈ G

(t)

log w

(t)

+ β



t=1

(t)

log v

(t)

(1)

subject to



t=1

(t)

= 1 , 0 ≤ v

(t)

≤ 1 ,



j∈ G

(t)

= 1 , t = 1 , . . . , T ,



k =1

(t)

= 1 , u

(t)

∈ { 0 , 1 } , i = 1 , . . . , N, t = 1 , . . . , T ,

where  is a penalty term deﬁned as

 =



t=1





 = t





i =1



k =1

| u



)

− u

(t)



(t)



j∈ G

(t)

i,k ; j

− v



)



j∈ G



)



)



)

i,k ; j

 

. (2)

Two parameters, i.e., α and β, are introduced to control the distri-

bution of weighting variables W and V, respectively. The parameter

η is introduced to control the effect of the penalty term .

The objective function in Eq. (1) consists of three parts. The

ﬁrst part computes the sum of within-cluster distances in each

view by assigning weights to the views and the features (i.e. the

two-level weighting strategy with level one corresponding to the

view weighting strategy and level two corresponding to the fea-

ture weighting strategy). The second part is a penalty term  that

measures the disagreement across multiple views in a collabora-

tive manner. The third part consists of two entropy-based terms,

which adjust the inﬂuence of the weighting variables W and V in

the objective function, respectively. In the literature, collaborative

clustering was ﬁrst proposed to solve the clustering problem which

consists of several separate subsets [18–21] . In general, these sub-

sets have common information. The collaborative manner can ef-

fectively discover their shared structure and the complementary

information can be integrated from the other subsets during the it-

eration. Inspired by the collaborative clustering, our approach also

utilizes the collaborative manner for multi-view clustering.

Different from the previous collaborative multi-view clustering

approach [9,15] , the penalty term in our objective function can ex-

ploit the different importance of the multiple views and the fea-

tures in each view so as to construct a more satisfactory cluster-

ing result. In particular, if the cluster assignment in the t -th view

is different from that in the t



-th view, i.e., u

(t)

 = u



)

, then the

disagreement is made to penalize the objective function. It is ob-

vious that if the cluster assignments in different views are simi-

lar to each other, the disagreement term will tend to be smaller.

Besides,



(t)



j∈ G

(t)

i,k ; j

− v



)



j∈ G



)



)



)

i,k ; j



is a term corre-

sponding to | u

(t)

− u



)

| as the Euclidean distance between the lo-

cal view assignments u

(t)

and u



)

. The larger the parameter η is,

the greater the inﬂuence of the penalty term  on the objective

function is.

3.2. Optimization

In this section, the objective function in Eq. (1) is solved

by using the alternating optimization method, which consists of

four steps. In each step, we optimize one variable by ﬁxing the

other three variables. The detailed description of the optimization

method is provided as follows.

3.2.1. Update the cluster assignments

In this step, we update the cluster assignment u

(t)

by ﬁxing

variables M , W, V in the t -th view.

Like the conventional k -means-like clustering approaches

[16,22] , it is not feasible to directly take the derivative of

Eq. (1) w.r.t. u

(t)

. Hence, we adopt a new update rule which is

different from the update rule of the conventional k -means-like

approaches [16,22,23] . Furthermore, for different application tasks,

multi-view data may be embedded in different manifold struc-

tures. The existing works assumed that the multiple feature rep-

resentations of the multi-view data have strong and compact rela-

tionship so the cluster assignment is always updated in a concate-

nation manner. However, for many real-world multi-view datasets,

剩余11页未读，继续阅读

weixin_38741966

粉丝: 2
资源: 915

两级加权协作k-means：TW-Co-k-means在多视图聚类中的应用

详解Java实现的k-means聚类算法

【K-Means与层次聚类实战对比】：Python聚类算法的决策秘籍

聚类算法：K-means聚类与分析

揭秘OpenCV图像分割算法：K-Means聚类算法的奥秘与实践

C# OpenCV图像处理：图像分割大揭秘：K-Means聚类和分水岭算法

GMM与K-Means聚类大比拼：揭示两种聚类算法的异同，选择最优方案

详细分析基于融合表征学习与k-means的缺失多视图聚类算法的劣势

基于融合表征学习与k-means的缺失多视图聚类算法在疾病分类上的市场分析

k-means聚类优点

K-means聚类优点

K-means聚类算法作用

kmeans聚类算法. parameters: ----------- k: int 聚类的数目. max_iterations

基于Mahout实现K-Means聚类

K-means聚类的缺点

K-means聚类算法和FCM聚类算法的优缺点对比

k-means聚类分析有什么用

什么软件可以进行k-means聚类分析

详细描述多视图聚类的步骤

改进的K-means聚类

最新资源