SEMANTICS AND LOCALITY PRESERVING CORRELATION PROJECTIONS
Yan Hua
∗
, Jianhe Du
∗
, Yujia Zhu
†
, Ping Shi
∗
∗
Information Engineering School, Communication University of China, huayan@cuc.edu.cn
†
National Engineering Lab for Information Security Technologies, Institute of Information Engineering, CAS
ABSTRACT
Multi-view correlation learning has attracted great attention
with the proliferation of heterogeneous data. Typical method-
s, such as Canonical Correlation Analysis (CCA) and its vari-
ants, usually maximize one-to-one corresponding correlation
of inter-view data, while most of them neglect discriminative
multi-label information and local structure of each view data.
In this paper, we propose multi-label Semantics and Locality
Preserving Correlation Projections method (SLPCP), which
seeks for a semantic common subspace by jointly learning
view-specific linear projections from intra-view and inter-
view perspectives simultaneously. SLPCP can be easily op-
timized with generalized eigenvalue decomposition via con-
catenating the projections of multi-views. Applied to retrieval
tasks of image and text data in experiments, SLPCP outper-
forms state-of-the-art methods on a widely used dataset NUS-
WIDE. The extensive experiments also validate that it is ef-
fective to preserve the multi-label semantics and locality of
multi-view data.
Index Terms— Correlation learning, Multi-label, Inter-
view, Intra-view, Image and text retrieval
1. INTRODUCTION
Different types of contents from heterogeneous view or
modality could describe relevant topics, for example, images
and their surrounding texts on web pages, faces captured at
different poses and sentences of multilingual language. Ac-
quiring useful information from multi-view data can promote
the applications in computer vision and multimedia field [1]
[2] [3]. In the recent decade, considerable work [4] [5] has
been conducted along with the rapidly increasing multi-view
data and application demands.
Most of previous work focuses on learning a common
subspace shared by multi-view data, in which different types
of data can be directly compared. The most typical method is
Canonical Correlation Analysis (CCA) [6] [1]. The subspace
methods learn the comparable representations by maximiz-
ing the correlations [6] [7] or minimizing the distances [8]
The work is supported by the National Natural Science Foundation of
China (Grant No.61601414), and supported by SAPPRFT Project (No. 2015-
17).
Multi-label: glacier, mountain,
nighttime, person, sign, sky,
sun, sunset
Multi-label: mountain, sky,
sunset
Fig. 1. Two pairwise multi-view samples with multi-labels
from NUS-WIDE dataset. The two views are image and text,
respectively.
of coupled multi-view samples. Though they could narrow
the gap between heterogeneous data by modeling one-to-one
inter-view correlation, they are not designed to utilize class
information to learn discriminative subspace. Single class in-
formation of multi-view samples is introduced in discrimi-
native common subspace learning methods [3] [9] [10] [11].
Recently, three-view CCA [2] and multi-label CCA [12] are
proposed to build the inter-view correlation with multi-labels.
However, multi-label semantics existed in multi-view data in-
volve the semantics not only in inter-view but also in intra-
view data. For example, two multi-view pairs from NUS-
WIDE dataset shown in Fig.1, the left pair is annotated with
labels “glacier, mountain, nighttime, person, sign, sky, sun,
sunset” and the right pair with “mountain, sky, sunset”. The
multi-label information is not only shared between the inter-
view samples (Image 1 vs. Text 1, Image 1 vs. Text 2, Image
2 vs. Text 1 and Image 2 vs. Text 2), but also between intra-
view samples (Image 1 vs. Image 2 and Text 1 vs. Text 2).
In this paper, we introduce a semantics preserving correlation
method which could model multi-label information from both
inter-view and intra-view perspectives.
Most existing methods [8] [12] [3] are linear methods
by learning globally linear transformations for heterogeneous
multi-view data. To deal with non-linear correlations, almost
all subspace learning methods could be extended to kernel
version, where the non-linear problem is transformed into an-
other more possibly linear one in high dimensional (even in-
finite) feature space with kernel function. However, they still
978-1-5090-6067-2/17/$31.00
c
2017 IEEE
Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2017 10-14 July 2017
978-1-5090-6067-2/17/$31.00 ©2017 IEEE ICME 2017
913