meaningful for DR. Consequently, in this paper, we try to design a
general framework called semi-paired and semi-supervised
dimensionality reduction(S
2
DR), especially for multi-view data
by combining the semi-paired correlation analysis and the semi-
supervised DR into a unified framework, which takes not only the
discriminant information but also the within-view structural
(local and global) information into account.
Based on our S
2
DR framework, we put forward a novel multi-
view DR algorithm, and refer it as semi-paired and semi-super-
vised generalized correlation analysis (S
2
GCA). S
2
GCA makes as
maximal correlation as possible by performing CCA on given
paired data, while preserves geometric structure of unlabeled
data as sufficiently as possible and separates labeled data from
different classes as far as possible. Consequently, S
2
GCA can seek
the desirable directions which not only have maximal correlation
for paired data but also reflect the separability for the labeled
data. Experimental results on a toy dataset and four publicly-
available datasets including semi-supervised learning data (SSL)
[34,35], Multiple Feature Database(MFD) [36], WebKB dataset
[37] and advertisement dataset (Ads) [38] show its effectiveness
compared to the related DR methods.
Finally, it is worthwhile to highlight several advantages of our
S
2
GCA as follows:
(1) To the best of our knowledge, S
2
GCA is the first DR method to
deal with the semi-paired and semi-supervised multi-view data.
A general framework is further constructed in such scenario
including SemiCCA and SemiLRCCA as its special cases.
(2) Different from unsupervised SemiLRCCA and SemiCCA
which just utilize global or local (manifold) structure of
each view data, S
2
GCA fuses not only the global and local
structural information but also the discriminative information
into a single objective function, consequently, making it more
effective and flexible in modeling the given data since not
limited to whether paired and/or unpaired data should have
labels.
(3) Compared with the traditional semi-supervised DR methods
which can only be applicable in single-view data, S
2
GCA can
perform semi-supervised learning on two or more views data
simultaneously and thus can capture the latent knowledge in
data more sufficiently. Compared to existing multi-view
semi-supervised methods such as SCCA and MVSSDR which
work on semi-supervised and fully paired multi-view data,
S
2
GCA is free of the limitation of the correspondence between
different views to great extent.
(4) Compared with the works on supervised multi-view data,
such as DCCA, DCCAM and LDCCA, S
2
GCA copes with semi-
supervised multi-view data, which is more general and more
applicable.
(5) S
2
GCA characterizes the optimization objective as a generalized
eigenvalue problem, which can be solved simply and efficiently
as CCA, SCCA, DCCA, DCCAM, LDCCA, PPLCA, SemiCCA and
SemiLRCCA.
The rest of the paper is organized as follows. Section 2 gives a brief
review of the related works. In Section 3, we put forward a general
DR framework for multi-view data, semi-paired and semi-supervised
dimensionality reduction (S
2
DR). We then utilize the S
2
DR frame-
work as a general platform to design S
2
GCA algorithm, including the
motivation, formulation and solution in Section 4. Then we present
the experimental results and analysis both on toy data and real-
world datasets including SSL, MFD, WebKB and Ads databases in
Section 5. The conclusions and future works are listed in Section 6.
2. Related works
2.1. CCA: canonical correlation analysis
Given n pairs of pairwise samples fðx
1
, y
1
Þ, ..., ðx
n
, y
n
Þg centra-
lized by subtracting the total samples means from each sample.
Let X ¼½x
1
, ..., x
n
A R
pn
and Y ¼½y
1
, ..., y
n
A R
qn
. CCA [16–18]
attempts to find a set of projections (or directions) w
x
and w
y
for
each view such that the correlation between w
T
x
x and w
T
y
y is
maximized. The corresponding objective can be described as
follows:
max
w
x
, w
y
w
T
x
XY
T
w
y
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
w
T
x
XX
T
w
x
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
w
T
y
YY
T
w
y
q
ð1Þ
Evidently, it can be expressed by the following equality
constrained optimization problem [18]:
max
w
x,
w
y
w
T
x
XY
T
w
y
s:t: w
T
x
XX
T
w
x
¼ 1
w
T
y
YY
T
w
y
¼ 1 ð2Þ
By the Lagrange technique [18], the optimization of (2) boils
down to solving a generalized eigenvalue problem
0 XY
T
YX
T
0
"#
w
x
w
y
"#
¼
l
XX
T
0
0 YY
T
"#
w
x
w
y
"#
ð3Þ
Further, we can jointly get two projection matrices W
x
and W
y
consisting of the top r (r min(p,q)) generalized eigenvectors of
(3). In this way, a common dimensionality reduced subspace
maximizing the between-view correlation is established.
In fact, CCA is difficult to work effectively for nonlinearly-
correlated data due to its linearity in nature. Consequently, kernel
Table 1
Comparison of CCA, SemiCCA, SemiLRCCA, DCCA, LDCCA, DCCAM, MVSSDR, SCCA and PPLCA.
Paired information Discriminative information Structural information
Paired Semi-paired Unsupervised Semi-supervised Supervised Local
a
Global
CCA [16–18] ||
SemiCCA [15] || |
SemiLRCCA [20] || |
DCCA [8] | |
LDCCA [21] | ||
DCCAM [7] ||
MVSSDR [14] ||
SCCA [32] ||
PPLCA [19] || |
a
‘‘Local’’ means to use the data neighborhood information (e.g., manifold information) to construct scatter matrix.
X. Chen et al. / Pattern Recognition 45 (2012) 2005–2018 2007