Generalized nonlinear discriminant analysis and its small sample
size problems
Li Zhang
a,b,
, Wei Da Zhou
b
, Pei-Chann Chang
c
a
Research Center of Machine Learning and Data Analysis, School of Computer Science and Technology, Soochow University, Suzhou 215006, Jiangsu, China
b
Institute of Intelligent Information Processing, Xidian University, Xi’an 710071, Shaanxi, China
c
Department of Information Management, Yuan Ze University, Taoyuan 32026, Taiwan, China
article info
Article history:
Received 25 May 2009
Received in revised form
10 September 2010
Accepted 14 September 2010
Communicated by Liang Wang
Available online 27 October 2010
Keywords:
Fisher discriminant analysis
Kernel trick
Small sample size problem
abstract
This paper develops a generalized nonlinear discriminant analysis (GN DA) method and deals with its
small sample size (SSS) problems. GNDA is a nonlinear extension of linear discriminant analysis (LDA),
while kernel Fisher discriminant analysis (KFDA) can be regarded as a special case of GNDA. In LDA, an
under sample problem or a small sample size problem occurs when the sample size is less than the sample
dimensionality, which will result in the singularity of the within-class scatter matrix. Due to a
high-dimensional nonlinear mapping in GNDA, small sample size problems arise rather frequently.
To tackle this issue, this research presents five different schemes for GNDA to solve the SSS problems.
Experimental results on real-world data sets show that these schemes for GNDA are very effective in
tackling small sample size problems.
& 2010 Elsevier B.V. All rights reserved.
1. Introduction
Discriminant analysis has been widely used for feature
extraction and dimensionality reduction in pattern recognition.
Linear discriminant analysis (LDA), also known as Fisher linear
discriminant is one of the most commonly used method [1]. The
goal of LDA is to find an optimal subspace such that the separability
of two classes is maximized. LDA is to maximize
trðW
T
S
b
WÞ
trðW
T
S
w
WÞ
ð1Þ
where trðÞ denotes the trace of matrix , W is a linear projection or
transformation matrix, S
b
is the between-class scatter matrix and
S
w
is the within-class scatter matrix. Maximizing (1) results in the
following generalized eigenvalue problem
S
b
W ¼
l
S
w
W ð2Þ
The optimal discriminant subspace is spanned by the generalized
eigenvectors. If S
w
is nonsingular, the solution to the generalized
eigenvalue problem (1) is obtained by applying eigendecomposi-
tion on S
w
1
S
b
. However, for a small sample size (SSS) problem the
scatter matrix S
w
is singular. For example, face recognition is a
kind of SSS problems with high-dimensional and few training
samples. So far, there are some methods proposed to deal with the
problem of singularity of S
w
, such as Fisherface [2], discriminant
common vectors [3], dual space [4], LDA-GSVD (generalized
singular value decomposition) [5], LDA-QR [6], PCA+NULL [7],
and LDA-FKT (Fukunaga–Koontz transform) [8].In[8], a unifying
framework is proposed to understand different methods. By using
Fukunaga–Koontz transform (FKT), the whole sample space can be
decomposed into four subspaces. Discriminant information in the
four subspaces is different, and the performance of methods
depends on their subspaces. The authors in [8] also report that
LDA-GSVD is equivalent to the LDA-FKT, and LDA-FKT has the best
performance.
Unfortunately, LDA can only extract linear features from samples,
it fails to process the data which consist of nonlinear features [9].
Kernel Fisher discriminant analysis (KFDA), one of the nonlinear
discriminant methods, has been developed for extracting nonlinear
discriminant features [9]. A similar work as KFDA is presented in [10].
Kernel functions are restricted to positive semi-definite symmetric
functions, i.e., Mercer kernels as in [11,12]. KFDA often encounters SSS
problems because S
w
in a high-dimensional feature space is always
singular. To overcome the computational difficulty with KFDA, a
perturbation
m
I is added to S
w
in [9] where I is an identity matrix as
thesamesizeasS
w
, and kernel Fisherface is proposed in [13].
This paper proposes a generalized non linear discri minant analysis
(GNDA). GNDA consists of two steps. First, data in a sample space are
mapped into a nonlinear mapping space by using some nonlinear
mapping function. Then LDA is implemented in the nonlinear mapping
space. In GNDA, the nonlinear mapping function can be any real-valued
nonlinear function, for instance, empirical mapping functions as in
[14,15], Mercer kernel mapping as in [11,12],etc.GNDAisidentical
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/neucom
Neurocomputing
0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.neucom.2010.09.022
Corresponding author at: Research Center of Machine Learning and Data
Analysis, School of Computer Science and Technology, Soochow University, Suzhou
215006, Jiangsu, China.
E-mail address: zhangliml@suda.edu.cn (L. Zhang).
Neurocomputing 74 (2011) 568–574