混合自定步学习提升多视图K-means聚类性能

140 浏览量更新于2024-08-26 1 收藏 257KB PDF 举报

在现代生活中，数据日益增多，且常常具有多维度特征，这就催生了多视图数据分析的需求。多视图聚类（Multi-view Clustering）作为一种有效的处理方法，通过整合不同视角的信息来提升单一视图聚类的性能局限。传统的K-means算法在处理多视图数据时可能面临挑战，因为它假设所有视角的信息同等重要，但实际上，不同视角的重要性可能各异。本文提出了一种新颖的混合自适应步学习（Mixture Self-paced Learning, SPL）正则化器，其灵感来源于人类学习过程中的自我调整能力。这种学习策略允许算法逐步整合各个视角，从简单到复杂，以优化聚类任务。不同于常规的固定学习率，SPL能够根据数据的内在复杂性动态调整学习步骤，从而更好地挖掘潜在的结构和关系。作者们将这种自适应步学习正则化器与鲁棒的多视图K-means（Robust Multi-view K-means, RMVKM）算法相结合，进而提出了名为SPLMKM（Self-paced Learning based Multi-view K-means）的聚类方法。SPLMKM在处理多视图数据时，能够有效地平衡各个视角的权重，同时适应性地调整学习策略，使得聚类结果更加精确，噪声和异常值的影响减小，提高了整体的聚类性能。该研究的主要贡献包括： 1. **混合自适应步学习正则化**：设计了一个能够逐步融合不同视角信息的学习框架，适应数据复杂性的变化。 2. **SPLMKM算法**：结合鲁棒RMVKM和自适应学习策略，提升多视图聚类的稳定性和准确性。 3. **应用价值**：为实际生活中的多视图数据处理提供了强大的工具，尤其适用于那些特征多维、结构复杂的场景。总结来说，这篇研究论文关注于如何利用混合自适应步学习的思想改进多视图K-means聚类算法，以提升数据处理的效率和准确度，为多视图数据分析领域提供了一种创新且实用的方法。通过解决单视图聚类的局限，并考虑到不同视角信息的权重分配，SPLMKM有望在处理大规模、高维度数据集时展现出显著的优势。

Mixture Self-paced Learning for Multi-view

K-means Clustering

Hong Yu

School of Software

Dalian University of Technology Dalian, China

hongyu@dlut.edu.cn

Yahong Lian

School of Software

Dalian University of Technology Dalian, China

lianyahong1@163.com

Xiujuan Xu

School of Software

Dalian University of Technology Dalian, China

xjxu@dlut.edu.cn

Xiaowei Zhao

School of Software

Dalian University of Technology Dalian, China

xiaowei.zhao@dlut.edu.cn

Abstract—In our daily life, there are more and more data

characterized by multiple features. In multi-view setting, the

clusters estimated using single view have some limitations, and

the quality of single view clustering can be improved by means

of multi-view clustering. Self-paced learning simulates human

learning process which can gradually combine information of

views into clustering task from easy to complex. In this paper,

we ﬁrst propose a new mixture self-paced learning regularizer.

To recap the effectiveness of regularizer, we combine it with

robust multi-view k-means clustering and propose a new self-

paced learning based multi-view k-means (SPLMKM) clustering

method. As a non-trivial contribution, we present the solution

based on alternating minimization strategy. The comparative

experiments reveal the beneﬁt of our proposed method.

Index Terms—multi-view clustering; self-paced learning; k-

means

I. INTRODUCTION

Due to rapid development of data acquisition technology,

there are massive data from multiple domains. Thereinto, most

of data is characterized by multiple features. For example,

a web page can be described by the text information or

pictures included in the page. Multi-lingual documents repre-

sent articles via different language. In personal identiﬁcation

scene, a person can be recognized by facial picture, ﬁngerprint

or signature information. In multi-view setting, the clusters

estimated using single view have some limitations, and the

quality of single view clustering can be improved by means of

multi-view clustering. Multi-view clustering [1], [2] combines

information from multiple views to boost the clustering per-

formance. Recent years, there are a sight of researches about

multi-view clustering [3], [4].

K-means is a common and widely-used method and has a

great many advantages. It has been prevalent in unsupervised

learning domain because of its mathematical easiness and

implementation. Extending to multi-view scenario, Bickel et

al. [1] propose a multi-view spherical k-means clustering. In

the study [5], authors have shown that non-negative matrix

factorization is equivalent to relaxed k-means which utilizes

Frobenious norm to calculate reconstruction error. This leads

to the result that k-means is sensitive to noises and outliers.

2,1

norm is called sparsity-inducing norm which combines l

norm and l

norm. Applying l

2,1

norm to calculate k-means

reconstruction error can obtain more robust results [6], [7].

It can also be utilized to impose the structured sparsity on

the learned weight matrix and boost the multi-view clustering

performance [8], [9].

Most off-the-shelf multi-view k-means clustering methods

tend to solve non-convex objective functions. This deﬁciency

often makes them getting stuck into local minima, especially

with the interference of noises and outliers. The self-paced

learning [10] well simulates the process of human learning.

Those ‘easy’ samples will be chosen ﬁrst to train a model, and

then ‘complex’ samples are faded into learning process(Figure

1 gives the brief example of self-paced learning). Some

proposed multi-view clustering methods which combine self-

paced learning scheme have shown to be beneﬁcial in avoiding

bad local minima and efﬁcient to improve ﬁnal results [11]–

[14].

Fig. 1. An example of self-paced learning process. For clustering task, it is

instructive to note that the greater variance of the clusters, the more complex

the clustering(thus in our example view 3 is the most complex view and view

1 is the easiest view). Initially, three views have equal weights. And then

increases the weight of view 1 and view 2 and decreases the weight of view

3(learn from easy views ﬁrst). And in the third group of pictures, increases

the weight of view 2 which is relative ’easier’ than view 3. Through this

process, self-paced learning helps model to learn from ’easy’ view ﬁrst and

then gradually include other views into the learning task.

2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence

and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress

DOI 10.1109/DASC-PICom-DataCom-CyberSciTec.2017.193

1210

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38720390

粉丝: 1
资源: 971

混合自定步学习提升多视图K-means聚类性能

加权视图K均值聚类：提升多视图数据处理效果

实现大规模多视图子空间聚类的k均值算法源码发布

多视图上限范数K均值：稳健聚类提升数据解析

KMEANS.zip_Iris聚类matlab_iris matlab_k均值聚类iris_k均值聚类matlab

自己动手写网络爬虫(基本全)

KMeans与GMM在图像聚类插值中的应用研究

聚类算法全解：从基础到应用及优化技术

ClusterViz: 开源三维数据聚类与可视化工具

Android端GMM高斯混合模型编程实战

深入了解机器学习的常用程序工具

最新资源