多视图聚类的多目标进化优化：SPEA2算法优胜

51 浏览量更新于2024-08-26 1 收藏 424KB PDF 举报

本文主要探讨了"多视图聚类的进化多目标优化"这一主题，针对实际应用中常见的多视图数据处理问题。在许多场景下，通过多种测量方法提取数据的不同特征组，产生了多维度的数据结构，这为聚类分析带来了挑战。传统的方法往往倾向于处理单一视图，而忽略了不同视图之间的互补信息以及由此产生的冲突。在多视图聚类中，关键的挑战在于如何有效整合各个视图，既要利用它们提供的额外信息，又需处理由不同方法引发的冲突。为了克服这个问题，作者将传统的单目标优化视角转向多目标优化。以往的研究通常采用加权和的方法来量化视图间的冲突，将其转化为一个带有权重的单目标问题。然而，这种方法可能无法充分探索解决方案空间的所有可能组合。本文创新地将多视图聚类转换为一个多目标优化问题，将每个视图视为独立的目标函数，每个视图中的聚类任务都视为一个目标。这种转变使得算法能够并行处理多个优化目标，考虑了多个视图的平衡和协同作用。作者选取了五种流行多目标进化算法（MOEA），包括NSGA-II、SPEA2、MOEA/D、SMS-EMOA和NSGA-III，来进行多目标优化。在实验部分，作者使用了六个真实世界的数据集来评估这一方法的有效性。实验结果表明，SPEA2算法在这项多目标优化任务中表现显著，根据三个评估指标（如聚类准确性、稳定性、多样性等）优于其他MOEA。这表明SPEA2在处理多视图数据的复杂性和冲突方面具有优势，能够找到更优的解决方案组合。本文的贡献在于提出了一种新颖的多目标优化框架来解决多视图聚类问题，强调了考虑多个目标和视图间冲突的重要性。通过实证验证，它展示了SPEA2在处理此类复杂问题时的高效性和有效性，为多视图数据的聚类提供了新的理论支持和实践指导。未来的研究可能进一步改进算法效率或扩展到更多复杂的应用场景。

Evolutionary Multi-objective Optimization for

Multi-view Clustering

Bo Jiang

∗

, Feiyue Qiu

∗

, Shipin Yang

†

and Liping Wang

‡

∗

College of Education Science and Technology, Zhejiang University of Technology, Hangzhou, China 310023

Email: bjiang, qfy@zjut.edu.cn

†

College of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing, China 211800

Email: spyang@njtech.edu.cn

‡

College of Administration and Management, Zhejiang University of Technology, Hangzhou, China 310023

Email: wlp@zjut.edu.cn

Abstract—In some real-world applications, multiple measur-

ing methods are often employed to extract multiple feature groups

of data, yielding multi-view data. The main challenge of multi-

view clustering is to ﬁnd a suitable way of simultaneously ex-

ploiting the complementary information of all views, considering

the view conﬂicts arose by different measures. For perspective

of optimization, previous multi-view clustering studies applied

weighted sum method to represent degree of conﬂict and treated

it as a weighted sum single-objective optimization problem. In

this work, we formatted multi-view clustering as a multi-objective

optimization problem, in which each view is regarded as a totally

independent feature subset. The clustering objective function

in each view is one of the multiple objectives. Five popular

multi-objective evolutionary algorithms (MOEAs), i.e., NSGA-

II, SPEA2, MOEA/D, SMS-EMOA and NSGA-III, were used

to solve the induced multi-objective problem. Six real-world

multi-view datasets were used to evaluate the proposed method

and the experimental results showed that SPEA2 signiﬁcantly

outperformed the other MOEAs according to three performance

evaluation indices.

I. INTRODUCTION

Multi-view data is common in many real-world and sci-

entiﬁc ﬁelds in big data era. For example, in massive open

online courses (MOOCs), both students’ course registration

data (view 1) and online behavioral data (view 2) are used

to predict students dropout [1]; in image analysis, each image

can be represented by several different visual descriptors, such

as RGB color histograms, HSV color histograms and Haralick

texture features [2]. Each type of view capture distinct per-

spectives of the data. Fox example, in MOOCs data, students

course registration data describe their demographic information

and past academic grade, and online behavioral data record

their interaction behaviours during learning, such as posting in

forum, viewing lecture, view forums and submitting quizzes,

which reﬂect level of learning engagement. Therefore, it is

crucial to integrate these heterogeneous views to generate more

accurate and robust clustering results, rather than relying on

single view.

The goal of multi-view clustering is to ﬁnd clusters that

are consistent across different views. According to how the

multiple views are utilized, existing work in multi-view clus-

tering can be broadly classiﬁed into centralized methods and

distributed methods. Algorithms in the ﬁrst category utilize

all views simultaneously to discover hidden patterns [3], [4],

[5], [6], [7], [8], [9], [10], [11]. In contrast, approaches from

the second category ﬁrst cluster each view independently and

then combine the individual clustering results to produce a

ﬁnal partition [12], [13], [14]. Although centralized multi-

view clustering methods gained increasing attention in the

past decade due to their good performance, most of them

are based on spectral clustering that needs heavy computation

of the kernel construction and eigenvector decomposition, so

these methods cannot be used for tackling large-scale datasets.

Therefore, several multi-view clustering algorithms based on

K-means equivalences were proposed to solve large-scale

multi-view clustering problems [15], [16], [17]. In these stud-

ies, tackling disagreement among views is a key issue. Since

multiple views are derived from integration of multiple types

of measurements, they have very different statistical properties

and produce different partitions, it is very hard to ﬁnd a

pattern completely consistent for all views. In contrast, a more

realistic option is ﬁrst to discriminate views and then conduct

clustering on the discriminative feature space. A simple yet

efﬁcient method is view weighting. Some typical examples of

this kind of methods includ view weighted nonnegative matrix

factorization [7], view weighted spectral clustering [18], [9],

view weighted K-means [15], [11] and two-level weighted K-

means algorithm [16], [19].

Previous discriminative multi-view clustering methods for-

mulated the problem as a weighted sum single-objective op-

timization problem that needs to solve the best view weight

and partition simultaneously. The key advantage of this kind

of method is that, they do not make restricted assumptions that

all views are compatible to each other, and so they are usually

more robust and ﬂexible. However, from optimization point

of view, the objective function of the weighted multi-view

clustering is very hard to be solved efﬁciently in general. On

one hand, the high feature dimension of the multi-view dataset

makes the optimization problem with large-scale variables.

For example, the famous handwritten digit dataset has 649

dimensions

and the 3-Sources news dataset has more than

dimensions

. On the other hand, more importantly, the

commonly used weighting methods, including fuzzy weighting

[9], [18], [15], negative entropy weighting[16], [17] and sparse

regularization [11], [20], [21], make the objective function non-

convex and non-smooth.

http://archive.ics.uci.edu/ml/datasets/

http://mlg.ucd.ie/datasets/3sources.html

3308

978-1-5090-0623-6/16/$31.00

2016 IEEE

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38686041

粉丝: 2
资源: 952

多视图聚类的多目标进化优化：SPEA2算法优胜

多视图聚类

多视图聚类数据集mfeat

LMSC_多视图聚类PID_LMSC多视图聚类

TETCI2021论文代码：新方法不平衡不完全多视图聚类

数据挖掘中的聚类算法综述

基于聚类分析的多源异构数据挖掘技术研究.pdf

自适应进阶的多任务多视图上限范数聚类算法

视觉语义融合方法在多相机车辆跟踪中的应用

【数据集成技巧】：合并多个数据源的高效策略

计算机基础知识及应用技术总结

最新资源