多视图频谱嵌入：解决多特征表示的优化算法

162 浏览量更新于2024-08-28 收藏 864KB PDF 举报

多视图光谱嵌入（Multi-View Spectral Embedding, MSE）是一种创新的计算机视觉和多媒体搜索中的特征表示方法，它在处理来自不同视图（如颜色、纹理和形状）的多维度特征时展现出独特的优势。传统的频谱嵌入算法无法直接适应这类包含多种特性（Multiple features）的数据结构，因为它假设所有特征在物理上是相互关联的，而实际上，每种特性有其特定的统计属性，这种级联连接缺乏物理意义。 MSE的核心理念在于寻找一种方法，能够以不同的编码方式处理各视图，使得在低维嵌入（Dimensionality reduction）空间中，各视图的分布既平滑又能体现出互补性（Complementary property）。通过这种方式，算法能够在保留原始信息的同时，实现更有效的数据表示和可视化，有助于提高图像检索（Multimedia search）、视频注解以及文档聚类（Document Clustering）等任务的性能。由于MSE缺乏闭式（Closed form solutions）解决方案，开发团队采用迭代优化（Iterative algorithm）的方法，这可能涉及到交替优化策略，通过多次迭代来逐步逼近最优的低维嵌入。这种方法确保了算法能够在不断优化中捕捉到各个视图之间的重要关系，从而避免了简单地串联特征所带来的不连贯性问题。经验评估部分展示了MSE在实际应用中的有效性，通过对比实验和案例研究，证实了该算法在处理多视图数据时，不仅能够降低维度，提高效率，而且能有效整合不同视图的信息，提升任务执行的准确性和鲁棒性。因此，多视图光谱嵌入不仅是一个理论上的突破，也是未来计算机视觉和多媒体分析领域的实用工具。

1438 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 6, DECEMBER 2010

Multiview Spectral Embedding

Tian Xia, Dacheng Tao, Member, IEEE, Tao Mei, Member, IEEE, and Yongdong Zhang, Member, IEEE

Abstract—In computer vision and multimedia search, it is com-

mon to use multiple features from different views to represent an

object. For example, to well characterize a natural scene image, it

is essential to ﬁnd a set of visual features to represent its color,

texture, and shape information and encode each feature into a

vector. Therefore, we have a set of vectors in different spaces to

represent the image. Conventional spectral-embedding algorithms

cannot deal with such datum directly, so we have to concatenate

these vectors together as a new vector. This concatenation is not

physically meaningful because each feature has a speciﬁc statis-

tical property. Therefore, we develop a new spectral-embedding

algorithm, namely, multiview spectral embedding (MSE), which

can encode different features in different ways, to achieve a

physically meaningful embedding. In particular, MSE ﬁnds a low-

dimensional embedding wherein the distribution of each view is

sufﬁciently smooth, and MSE explores the complementary prop-

erty of different views. Because there is no closed-form solution

for MSE, we derive an alternating optimization-based iterative

algorithm to obtain the low-dimensional embedding. Empirical

evaluations based on the applications of image retrieval, video an-

notation, and document clustering demonstrate the effectiveness

of the proposed approach.

Index Terms—Dimensionality reduction, multiple views,

spectral embedding.

I. INTRODUCTION

N COMPUTER vision and multimedia search [5], [6],

objects are usually represented in several different ways.

This kind of data is termed as the multiview data. A typical

example is a color image, which has different views from dif-

ferent modalities, e.g., color, texture, and shape. Different views

form different feature spaces, which have particular statistical

properties.

Manuscript received May 14, 2009; revised August 31, 2009 and November

18, 2009; accepted December 6, 2009. Date of publication February 17, 2010;

date of current version November 17, 2010. This work was supported in part

by the National Basic Research Program of China (973 Program) under Grant

2007CB311100; by the National High-Technology Research and Development

Program of China (863 Program) under Grant 2007AA01Z416; by the National

Natural Science Foundation of China under Grants 60873165, 60802028, and

60902090; by the Beijing New Star Project on Science and Technology under

Grant 2007B071; by the Co-building Program of Beijing Municipal Education

Commission; by the Nanyang Technological University Nanyang SUG Grant

under Project M58020010; by the Microsoft Operations PTE LTD-NTU Joint

R&D under Grant M48020065; and by the K. C. Wong Education Foundation

Award. This paper was recommended by Associate Editor S. Sarkar.

T. Xia and Y. Zhang are with the Center for Advanced Computing Tech-

nology Research, Institute of Computing Technology, Chinese Academy of

Sciences, Beijing 100190, China (e-mail: txia@ict.ac.cn; zhyd@ict.ac.cn).

D. Tao is with the School of Computer Engineering, Nanyang Technological

University, Singapore 639798 (e-mail: dctao@ntu.edu.sg).

T. Mei is with Microsoft Research Asia, Beijing 100190, China (e-mail:

tmei@microsoft.com).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TSMCB.2009.2039566

Because of the popularity of multiview data in practical

applications, particularly in the multimedia domain, learning

from multiview data, which is also known as multiple-view

learning, has attracted more and more attentions. Although a

great deal of efforts have been carried out on multiview data

learning [1], including classiﬁcation [21], clustering [4], [19],

and feature selection [20], little progress has been made in

dimensionality reduction, whereas it has many applications in

multimedia [28], e.g., image retrieval and video annotation.

Multimedia data generally have multiple modalities, and each

modality is usually represented in a high-dimensional feature

space which frequently leads to the “curse of dimensional-

ity” problem. In this case, multiview dimensionality reduction

provides an effective solution to solve or at least reduce this

problem.

In this paper, we consider the problem of spectral em-

bedding for multiple-view data based on our previous patch

alignment framework [29]. The major challenge is learning a

low-dimensional embedding to effectively explore the comple-

mentary nature of multiple views of a data set. The learned

low-dimensional embedding should be better than a low-

dimensional embedding learned by each single view of the

data set.

Existing spectral-embedding algorithms assume that sam-

ples are drawn from a vector space and thus cannot deal

with multiview data directly. A possible solution is to con-

catenate vectors from different views together as a new vec-

tor and then apply spectral-embedding algorithms directly on

the concatenated vector. However, this concatenation is not

physically meaningful because each view has a speciﬁc sta-

tistical property. This concatenation ignores the diversity of

multiple views and thus cannot efﬁciently explore the com-

plementary nature of different views. Another solution is the

distributed spectral embedding (DSE) proposed in [3]. DSE

performs a spectral-embedding algorithm on each view in-

dependently, and then based on the learned low-dimensional

representations, it learns a common low-dimensional embed-

ding which is “close” to each representation as much as

possible. Although DSE allows selecting different spectral-

embedding algorithms for different views, the original multiple-

view data are invisible to the ﬁnal learning process, and

thus, it cannot well explore the complementary nature of dif-

ferent views. Moreover, its computational cost is dense be-

cause it conducts spectral-embedding algorithms for each view

independently.

To effectively and efﬁciently learn the complementary nature

of different views, we propose a new algorithm, i.e., multiview

spectral embedding (MSE), which learns a low-dimensional

and sufﬁciently smooth embedding over all views simultane-

ously. Empirical evaluations based on image retrieval, video

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38625464

粉丝: 5

多视图频谱嵌入：解决多特征表示的优化算法

MATLAB实现七种多视图光谱聚类算法库

3D Gabor多视图主动学习提升高光谱图像分类精度

多视图数据土地分类与半监督归一化嵌入实现

Semi-Supervised Normalized Embeddings for Land-Use Classification from Multiple View Data：根据多视图成像数据执行土地利用分类-matlab开发

多光谱神经网络：深度学习的谱方法

数据降维不求人：PCA在高光谱图像分析中的实战应用

高光谱数据可视化进阶：IDL交互式工具与自定义图形的终极指南

【降维技术速成】：掌握PCA、LDA在高光谱数据处理中的关键应用

高光谱图像分类新方法：传播滤波器与多视图融合ELM学习

vue.js v2.5.17

最新资源