统一框架：处理半配对与半监督多视图数据的降维方法

192 浏览量更新于2024-08-28 收藏 527KB PDF 举报

本文主要探讨了一种针对半配对和半监督多视图数据的统一降维框架，这在现实世界问题中普遍存在但传统方法如经典主成分分析（PCA）或典型相关分析（CCA）可能表现不佳。由于CC阿要求不同视图之间的数据配对，并且本质上是无监督的，它在处理这类混合类型的数据时面临挑战。传统的CCA通过寻找两个或多视图数据集中的最大线性相关性来实现数据降维，但在半配对数据中，每对样本可能只有部分属性匹配，导致了信息的丢失。另一方面，半监督学习利用有限的标记数据与大量未标记数据相结合，旨在挖掘潜在的结构和模式，但CC阿在此情境下无法充分利用这些信息。为了克服这些问题，研究人员提出了一种新的统一框架，该框架结合了半配对和半监督学习策略。这个框架首先对数据进行预处理，通过一种有效的策略来处理非配对的样本，例如，通过聚类或相似度度量来推断缺失的配对关系。然后，通过扩展CCA或者引入监督信息，使得模型能够更好地理解各视图间的关联，即使在缺乏全局配对的情况下也能捕捉到跨视图的结构。在模型设计上，可能采用的方法包括但不限于联合概率图模型、核方法或者集成学习技术，这些都旨在增强模型的鲁棒性和泛化能力。通过这种方式，该框架能够在保持原有CCA优点的同时，适应半配对和半监督数据的特点，从而提高维度降低的效果。在实验部分，作者可能展示了框架在实际问题上的应用，比如图像识别、文本分类或者生物信息学领域，通过对各种数据集的性能比较，验证其在减少数据复杂性、提高预测精度等方面的有效性。此外，文章还可能讨论了模型的优化策略、计算复杂度以及选择合适参数的重要性。总结来说，这篇论文提出了一个重要的研究贡献，即为多视图数据的降维提供了一个通用的方法，不仅适用于传统的配对情况，还能适应现实生活中常见的半配对和半监督场景。这对于许多依赖于多源数据分析的领域具有显著的实际价值。

meaningful for DR. Consequently, in this paper, we try to design a

general framework called semi-paired and semi-supervised

dimensionality reduction(S

DR), especially for multi-view data

by combining the semi-paired correlation analysis and the semi-

supervised DR into a uniﬁed framework, which takes not only the

discriminant information but also the within-view structural

(local and global) information into account.

Based on our S

DR framework, we put forward a novel multi-

view DR algorithm, and refer it as semi-paired and semi-super-

vised generalized correlation analysis (S

GCA). S

GCA makes as

maximal correlation as possible by performing CCA on given

paired data, while preserves geometric structure of unlabeled

data as sufﬁciently as possible and separates labeled data from

different classes as far as possible. Consequently, S

GCA can seek

the desirable directions which not only have maximal correlation

for paired data but also reﬂect the separability for the labeled

data. Experimental results on a toy dataset and four publicly-

available datasets including semi-supervised learning data (SSL)

[34,35], Multiple Feature Database(MFD) [36], WebKB dataset

[37] and advertisement dataset (Ads) [38] show its effectiveness

compared to the related DR methods.

Finally, it is worthwhile to highlight several advantages of our

GCA as follows:

(1) To the best of our knowledge, S

GCA is the ﬁrst DR method to

deal with the semi-paired and semi-supervised multi-view data.

A general framework is further constructed in such scenario

including SemiCCA and SemiLRCCA as its special cases.

(2) Different from unsupervised SemiLRCCA and SemiCCA

which just utilize global or local (manifold) structure of

each view data, S

GCA fuses not only the global and local

structural information but also the discriminative information

into a single objective function, consequently, making it more

effective and ﬂexible in modeling the given data since not

limited to whether paired and/or unpaired data should have

labels.

(3) Compared with the traditional semi-supervised DR methods

which can only be applicable in single-view data, S

GCA can

perform semi-supervised learning on two or more views data

simultaneously and thus can capture the latent knowledge in

data more sufﬁciently. Compared to existing multi-view

semi-supervised methods such as SCCA and MVSSDR which

work on semi-supervised and fully paired multi-view data,

GCA is free of the limitation of the correspondence between

different views to great extent.

(4) Compared with the works on supervised multi-view data,

such as DCCA, DCCAM and LDCCA, S

GCA copes with semi-

supervised multi-view data, which is more general and more

applicable.

(5) S

GCA characterizes the optimization objective as a generalized

eigenvalue problem, which can be solved simply and efﬁciently

as CCA, SCCA, DCCA, DCCAM, LDCCA, PPLCA, SemiCCA and

SemiLRCCA.

The rest of the paper is organized as follows. Section 2 gives a brief

review of the related works. In Section 3, we put forward a general

DR framework for multi-view data, semi-paired and semi-supervised

dimensionality reduction (S

DR). We then utilize the S

DR frame-

work as a general platform to design S

GCA algorithm, including the

motivation, formulation and solution in Section 4. Then we present

the experimental results and analysis both on toy data and real-

world datasets including SSL, MFD, WebKB and Ads databases in

Section 5. The conclusions and future works are listed in Section 6.

2. Related works

2.1. CCA: canonical correlation analysis

Given n pairs of pairwise samples fðx

, y

Þ, ..., ðx

, y

Þg centra-

lized by subtracting the total samples means from each sample.

Let X ¼½x

, ..., x

A R

pn

and Y ¼½y

, ..., y

A R

qn

. CCA [16–18]

attempts to ﬁnd a set of projections (or directions) w

and w

for

each view such that the correlation between w

x and w

y is

maximized. The corresponding objective can be described as

follows:

max

, w

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ð1Þ

Evidently, it can be expressed by the following equality

constrained optimization problem [18]:

max

s:t: w

¼ 1

¼ 1 ð2Þ

By the Lagrange technique [18], the optimization of (2) boils

down to solving a generalized eigenvalue problem

0 XY

0 YY

ð3Þ

Further, we can jointly get two projection matrices W

and W

consisting of the top r (r min(p,q)) generalized eigenvectors of

(3). In this way, a common dimensionality reduced subspace

maximizing the between-view correlation is established.

In fact, CCA is difﬁcult to work effectively for nonlinearly-

correlated data due to its linearity in nature. Consequently, kernel

Table 1

Comparison of CCA, SemiCCA, SemiLRCCA, DCCA, LDCCA, DCCAM, MVSSDR, SCCA and PPLCA.

Paired information Discriminative information Structural information

Paired Semi-paired Unsupervised Semi-supervised Supervised Local

Global

CCA [16–18] ||

SemiCCA [15] || |

SemiLRCCA [20] || |

DCCA [8] | |

LDCCA [21] | ||

DCCAM [7] ||

MVSSDR [14] ||

SCCA [32] ||

PPLCA [19] || |

‘‘Local’’ means to use the data neighborhood information (e.g., manifold information) to construct scatter matrix.

X. Chen et al. / Pattern Recognition 45 (2012) 2005–2018 2007

剩余13页未读，继续阅读

weixin_38687539

粉丝: 9
资源: 923

统一框架：处理半配对与半监督多视图数据的降维方法

Learning Representation for Multi-View Data Analysis

qt-unified-windows-x64-4.6.0-online.exe

Adaptively Unified Semi-supervised Learning for Cross-modal Retrieval

A Unified Algorithmic Framework for Block-Structured Optimization Involving

An end-to-end unified framework for multi-lesion.pdf

2-3D-R2N2 A Unified Approach for Single and Multi-view 3D Object

matlab的素描代码-A-Simplified-framework-for-Zero-shot-Cross-Modal-Sketch-Dat

TIA博途-Data2Unified插件-Add-in-V3-2-0-0-V18版本.zip

A Unified Framework for the Study of Anti-Windup Designs

DCNSim: A Unified and Cross-Layer Computer Architecture Simulation Framework for Data Center Network Research

最新资源