Adaptive Semi-Supervised Dimensionality Reduction
Jia Wei
∗
, Jiabing Wang, Qianli Ma
School of Computer Science and Engineering,
South China University of Technology
Email: csjwei@scut.edu.cn
Xuan Wang
Computer Application Research Center,
Harbin Institute of Technology Shenzhen Graduate School
Email: wangxuan@insun.hit.edu.cn
Abstract—With the rapid accumulation of high dimensional
data, dimensionality reduction plays a more and more important
role in practical data processing and analysing tasks. This paper
studies semi-supervised dimensionality reduction using pairwise
constraints. In this setting, domain knowledge is given in the
form of pairwise constraints, which specifies whether a pair
of instances belong to the same class (must-link constraint) or
different classes (cannot-link constraint). In this paper, a novel
semi-supervised dimensionality reduction method called Adaptive
Semi-Supervised Dimensionality Reduction (ASSDR) is proposed,
which can get the optimized low dimensional representation of the
original data by adaptively adjusting the weights of the pairwise
constraints and simultaneously optimizing the graph construc-
tion. Experiments on UCI classification and face recognition show
that ASSDR is superior to many existing dimensionality reduction
methods.
I. INTRODUCTION
In many real world applications, such as face recognition,
information retrieval and bioinformatics, etc, one is often
confronted with high dimensional data. However, high di-
mensionality is a major cause of the practical limitations of
many pattern recognition technologies. Specifically, it has been
observed that a large number of features may actually degrade
the performance of classifiers if the number of training samples
is small relative to the number of the features. This is called
the ”Curse of Dimensionality” [10]. Fortunately, there might be
reason to suspect that the naturally generated high dimensional
data probably reside on a lower dimensional manifold. This
leads one to consider methods of dimensionality reduction
that allow one to represent the data in a lower dimensional
subspace.
The goal of dimensionality reduction is to reduce the
complexity of the input data while some desired intrinsic
information of the data is preserved. Two of the most popular
methods for dimensionality reduction are Principal Component
Analysis (PCA) and Linear Discriminant Analysis (LDA),
which are unsupervised and supervised respectively. PCA tries
to preserve the global covariance structure of the data in a low
dimensional projection subspace without knowing the class
labels of the data; while LDA aims to minimize the within-
class similarity and maximize the between-class similarity
simultaneously in a low dimensional projection subspace when
the class labels of the data are available.
In recent years, dimensionality reduction in semi-
supervised situation has attracted more and more attention
[16], [7]. In many real world applications such as image
classification, web page classification and protein function
prediction, a labeling process is costly and time-consuming; in
contrast, unlabeled examples can be easily obtained. Therefore,
in such situations, it can be beneficial to incorporate the
information which is contained in unlabeled examples into a
learning problem, i.e., semi-supervised learning (SSL) should
be applied instead of supervised learning.
However, in many cases, people cannot tell which category
an instance belongs to, that is we do not know the exact label of
an instance, and what we know is the constraint information of
whether a pair of instances belong to the same class (must-link
constraint) or different classes (cannot-link constraint) [18].
The above pairwise constraint information is called ”Side In-
formation”. It can be seen that side information is more general
than label information, because we can get side information
form label information but it cannot work contrariwise [13].
So learning with side information is becoming an important
area in machine learning community.
Recently, some related works have been proposed to make
use of the pairwise constraints to extract low dimensional
structure in high dimensional data. Bar-Hillel et al. proposed
Relevant Component Analysis (RCA) which can make use of
the must-link constraints for semi-supervised dimensionality
reduction [3]. Xing et al. [20], Tang et al. [17], Yeung et al.
[22] and An et al. [1] proposed different constraints based
semi-supervised dimensionality reduction methods, which can
make use of both the must-link constraints and cannot-link
constraints. Zhang et al. proposed Semi-Supervised Dimen-
sionality Reduction (SSDR) [23] and Chen et al. used SSDR
in hyperspectral image classification recently [8]. SSDR can
use the pairwise constraints as well as preserve the global
covariance structure of the unlabeled data in the projected low
dimensional subspace. Cevikalp et al. proposed Constrained
Locality Preserving Projections (CLPP) [5] which is the semi-
supervised version of LPP [12]. The method can make use of
the information provided by the pairwise constraints and can
also use the unlabelled data by preserving the local structure
used in LPP. Wei et al. proposed Neighborhood Preserving
based Semi-Supervised Dimensionality Reduction (NPSSDR)
[19] by using the pairwise constraints and preserving the
neighborhood structure used in LLE [14]. Baghshah et al.
used the idea of NPSSDR in metric learning and used a
heuristic search algorithm to solve the proposed constrained
trace ratio problem [2]. Davidson proposed a graph driven
constrained dimensionality reduction approach GCDR-LP for
clustering [9]. In this approach, a constraint graph is firstly
created by propagate the constraints due to transitivity and
entailment in the graph, and then the dimensionality reduction
can be conducted by the constraint graph. Yan et al. proposed
a method named Dual Subspace Projections (DSP) [21]. The
method first integrates the must-link constraints in the kernel
space to get kernel null space and then integrates the cannot-
2014 IEEE International Conference on Data Mining Workshop
978-1-4799-4274-9/14 $31.00 © 2014 IEEE
DOI 10.1109/ICDMW.2014.20
684