基于NMF的多视图无监督聚类：一致性与可比性的融合

需积分: 36 176 浏览量更新于2024-09-03 收藏 150KB PDF 举报

"《多视图聚类通过联合非负矩阵分解》是一篇研究论文，针对许多现实世界数据集存在的多维度、多视角特性，提出了一个新颖的多视图聚类方法。在这些数据集中，不同视角通常提供互补的信息，多视图聚类的目标是整合这些信息，同时从每个视角生成一致的聚类结果，揭示各个视角共享的潜在结构。论文的核心在于利用非负矩阵分解（Non-negative Matrix Factorization, NMF）作为基础技术。传统的NMF用于将数据矩阵分解为两个非负因子，但在多视图情况下，作者设计了一种联合的矩阵分解过程，引入了约束机制，确保每个视角的聚类解决方案朝着共同的共识方向收敛，而不是简单地固定为独立的结果。这种约束性的设计使得算法能够适应多视角数据的特点，提高聚类的稳健性和一致性。然而，一个关键挑战是如何保证在不同视角下的聚类结果既具有意义又可以进行有效的比较。为了克服这一问题，作者借鉴了NMF与概率潜在语义分析(Probabilistic Latent Semantic Analysis, PLSA)之间的联系，开发了一种创新的归一化策略。这种策略旨在平衡各个视角的贡献，使得聚类结果在各个视角间具有内在的一致性，从而增强聚类的可解释性和有效性。论文通过在多个数据集上进行实验验证，展示了新方法的有效性和可靠性。实验结果显示，与传统单视图聚类方法相比，该算法在处理复杂多视角数据时，不仅提高了聚类的准确度，还能够揭示出更深层次的数据关联，对于实际应用具有很高的价值。这篇论文为多视图聚类问题提供了一个强大的工具，有助于在无监督学习环境中挖掘和利用多源信息，提升数据分析的全面性和深度。"

Multi-View Clustering via Joint Nonnegative Matrix Factorization

Jialu Liu

, Chi Wang

, Jing Gao

, and Jiawei Han

University of Illinois at Urbana-Champaign

University at Buﬀalo

Abstract

Many real-world datasets are comprised of diﬀerent rep-

resentations or views which often provide information

complementary to each other. To integrate information

from multiple views in the unsupervised setting, multi-

view clustering algorithms have been develop ed to clus-

ter multiple views simultaneously to derive a solution

which uncovers the common latent structure shared by

multiple views. In this paper, we propose a novel NMF-

based multi-view clustering algorithm by searching for a

factorization that gives compatible clustering solutions

across multiple views. The key idea is to formulate a

joint matrix factorization process with the constraint

that pushes clustering solution of each view towards

a common consensus instead of ﬁxing it directly. The

main challenge is how to keep clustering solutions across

diﬀerent views meaningful and comparable. To tackle

this challenge, we design a novel and eﬀective normaliza-

tion strategy inspired by the connection between NMF

and PLSA. Experimental results on synthetic and sev-

eral real datasets demonstrate the eﬀectiveness of our

approach.

1 Introduction

Many datasets in real world are naturally comprised of

diﬀerent representations or views [5]. For example, the

same story can be told in articles from diﬀerent news

sources, one document may be translated into multiple

diﬀerent languages, research communities are formed

based on research topics as well as co-authorship links,

web pages can be classiﬁed based on both content and

anchor text leading to hyperlinks, and so on. In these

applications, each data set is represented by attributes

that can naturally be split into diﬀerent subsets, any

of which suﬃces for mining knowledge. Observing that

these multiple representations often provide compatible

and complementary information, it becomes natural

for one to integrate them together to obtain better

performance rather than relying on a single view. The

key of learning from multiple views (multi-view ) is to

leverage each view’s own knowledge base in order to

outperform simply concatenating views.

As unlabeled data are plentiful in real life and in-

creasing quantities of them come in multiple views from

diverse sources, the problem of unsupervised learning

from multiple views of unlabeled data has attracted

attention [3, 17], referred to as multi-view clustering.

The goal of multi-view clustering is to partition objects

into clusters based on multiple representations of the

object. Existing multi-view clustering algorithms can

be roughly classiﬁed into three categories. Algorithms

in the ﬁrst category [3, 17] incorp orate multi-view inte-

gration into the clustering process directly through op-

timizing certain loss functions. In contrast, algorithms

in the second category such as the ones based on Canon-

ical Correlation Analysis [8, 4] ﬁrst project multi-view

data into a common lower dimensional subspace and

then apply any clustering algorithm such as k-means to

learn the partition. The third category is called late in-

tegration or late fusion, in which a clustering solution

is derived from each individual view and then all the

solutions are fused base on consensus [7, 13].

In this paper, we propose a new multi-view cluster-

ing approach based on a highly eﬀective technique in

single-view clustering, i.e., non-negative matrix factor-

ization (NMF) [18]. NMF, which was originally intro-

duced as a dimensionality reduction technique [18], has

been shown to be useful in many research areas such

as information retrieval [20] and pattern recognition

[18]. NMF has received much attention because of its

straightforward interpretability for applications, i.e., we

can explain each observation as an additive linear com-

binations of nonnegative basis vectors. Recently, NMF

has become a popular technique for data clustering, and

it is reported to achieve competitive performance com-

pared with most of the state-of-the-art unsupervised al-

gorithms. For example, Xu et al. [20] applied NMF to

text clustering and gained superior performance, and

Brunet et al. [6] achieved similar success on biological

data clustering. Recent studies [9, 11] show that NMF

is closely related to Probabilistic Latent Semantic Anal-

下载后可阅读完整内容，剩余8页未读，立即下载

杜兵伟

粉丝: 65
资源: 8

基于NMF的多视图无监督聚类：一致性与可比性的融合

Algorithms for Non-negative Matrix Factorization

基于非负矩阵分解的基于多视图聚类的社交网络视频聚类

Multi-View Clustering via Joint Nonnegative Matrix Factorization.pdf

Documents clustering based on max-correntropy nonnegative matrix factorization

Protein complex detection via weighted ensemble clustering based on Bayesian nonnegative matrix factorization

Incomplete multi-view clustering.pdf

Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation

《Learning the parts of objects by nonnegative matrix factorization》

Multi-View Multiple Clustering.pdf

基于matlab实现的非负矩阵分解(non-negative matrix factorization,NMF)算法.rar

最新资源