Adaptive semi-supervised dimensionality reduction with sparse
representation using pairwise constraints
Jia Wei
a
, Meng Meng
a
, Jiabing Wang
a
, Qianli Ma
a
, Xuan Wang
b
a
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
b
Computer Application Research Center, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
article info
Article history:
Received 20 August 2014
Received in revised form
29 September 2015
Accepted 19 November 2015
Communicated by Feiping Nie
Available online 2 December 2015
Keywords:
Semi-supervised learning
Dimensionality reduction
Pairwise constraints
Sparse representation
abstract
With the rapid accumulation of high dimensional data, dimensionality reduction plays a more and more
important role in practical data processing and learning tasks. This paper studies semi-supervised
dimensionality reduction using pairwise constraints. In this setting, domain knowledge is given in the
form of pairwise constraints, which specifies whether a pair of instances belong to the same class (must-
link constraint) or different classes (cannot-link constraint). In this paper, a novel semi-supervised
dimensionality reduction method called Adaptive Semi-Supervised Dimensionality Reduction with
Sparse Representation (ASSDR-SR) is proposed, which can get the optimized low dimensional repre-
sentation of the original data by adaptively adjusting the weights of the pairwise constraints and
simultaneously optimizing the graph construction using the ℓ
1
graph of sparse representation. Experi-
ments on clustering and classification task s show that ASSDR-SR is superior to some existing dimen-
sionality reduction methods.
& 2015 Elsevier B.V. All rights reserved.
1. Introduction
The goal of dimensionality reduction is to reduce the com-
plexity of the input data while some desired intrinsic information
of the data is preserved. Two of the most popular methods for
dimensionality reduction are Principal Component Analysis (PCA)
[1] and Linear Discriminant Analysis (LDA) [2], which are unsu-
pervised and supervised respectively.
In many real world applications such as image segmentation,
web page classification and gene-expression clustering, a labeling
process is costly and time-consuming; in contrast, unlabeled
examples can be easily obtained. Therefore, in such situations,
it may be beneficial to incorporate the information which is
contained in unlabeled examples into a learning problem, i.e.,
Semi-Supervised Learning (SSL) [3] should be applied instead
of supervised learning. Meanwhile, dimensionality reduction in
semi-supervised situation has also attracted more and more
attention [4,5].
However, in many cases, people cannot tell which category an
instance belongs to, that is we do not know the exact label of an
instance, and what we know is the constraint information of
whether a pair of instances belong to the same class (must-link
constraint) or different classes (cannot-link constraint) [6]. The
above pairwise constraint information is called “Side Information”
[7]. It can be seen that constraint information is more general than
label information, because we can get constraint information from
label information but it cannot work contrariwise [8].
Some related works have been proposed to make use of the
pairwise constraints to extract low dimensional structure in high
dimensional data. Bar-Hillel et al. proposed Relevant Component
Analysis (RCA) which can make use of the must-link constraints
for semi-supervised dimensionality reduction [9] . Xing et al. [7],
Tang et al. [10], Yeung et al. [11] and An et al. [12] proposed some
constraints based semi-supervised dimensionality reduction
methods, which can make use of both the must-link constraints
and cannot-link constraints. Zhang et al. proposed Semi-
Supervised Dimensionality Reduction (SSDR) [13] and Chen et al.
used SSDR in hyperspectral image classification [14]. SSDR can use
the pairwise constraints as well as preserve the global covariance
structure of the unlabeled data in the projected low dimensional
subspace. Cevikalp et al. proposed Constrained Locality Preserving
Projections (CLPP) [15] which is the semi-supervised version of
LPP [16]. The method can make use of the information provided by
the pairwise constraints and can also use the unlabelled data by
preserving the local structure used in LPP. Wei et al. proposed
Neighborhood Preserving based Semi-Supervised Dimensionality
Reduction (NPSSDR) [17] by using the pairwise constraints and
preserving the neighborhood structure used in LLE [18]. Baghshah
et al. used the idea of NPSSDR in metric learning and used a
heuristic search algorithm to solve the proposed constrained trace
ratio problem [19]. Davidson proposed a graph driven constrained
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/neucom
Neurocomputing
http://dx.doi.org/10.1016/j.neucom.2015.11.048
0925-2312/& 2015 Elsevier B.V. All rights reserved.
E-mail address: csjwei@scut.edu.cn (J. Wei).
Neurocomputing 177 (2016) 564–571