in less computation, it still has two limitations when applied in
human face recognition. Firstly, DT-CWT does not consider the
structural information in human face. As we know, the different
facial regions have different degrees of importance, especially the
eyes, mouth, face contour, etc. [14]. In order to distinguish two
faces, the differences at these important regions should be
emphasized. However, traditional DT-CWT representation does
not consider the relative importance between the different facial
regions, and make no distinction between different parts of the
face. Secondly, DT-CWT does not consider the statistical
distribution of the transformed features. It treats each element
equally and cannot emphasize those elements with high statistical
probabilities which may play an important role in discrimination.
Extracting proper features is crucial for satisfactory design of
any pattern classifier, and how to develop a general procedure for
effective feature extraction remains an interesting and challenging
problem [15,16]. One usually starts with a given set of features and
then attempts to derive an optimal subset (under some criteria) of
features leading to high classification performance with the
expectation that similar performance can also be displayed on
future trials using novel (unseen) test data [17]. Principal
Component Analysis (PCA) [18] is a popular technique used to
derive a starting set of features for both face representation and
recognition. As it is based on the optimal representation criterion in
the sense of mean-square error, PCA does not consider the
classification aspect. To improve the classification performance,
one needs to combine further this optimal representation criterion
with some discrimination criterion. One widely used criterion in
the face recognition community is the Fisher Linear Discriminant
(FLD, a.k.a. Linear Discriminant Analysis, or LDA) [19], which tries
to maximize the ratio
J
FLD
ðW
opt
Þ¼argmax
W
W
T
S
B
W
W
T
S
W
W
ð1Þ
where S
B
is the between-class scatter matrix, and S
W
is the within-
class scatter matrix. Thus, by applying LDA, we can find the optimal
feature vectors that on the one hand maximize the Euclidean
distance between the face images of different classes and on the
other minimize the distance between the face images of the same
class. This ratio is maximized when the column vectors of the
projection matrix W are the eigenvectors of S
1
W
S
B
.
There are two limitations for LDA when used for pattern
classification. One is the so-called small sample size (SSS) problem.
In face recognition tasks, the dimension of the sample space is
typically larger than the number of samples in the training set. As a
consequence, S
W
is singular and we cannot compute S
1
W
S
B
directly.
In the past few decades, various approaches have been proposed to
solve this problem. A common way to deal with the singularity
problems is to apply an intermediate dimension reduction
stage such as PCA to reduce the dimension of the original data
before classical LDA is applied. The algorithm is known as PCA+LDA
[19–21]. In this two-stage PCA+LDA algorithm, the discriminant
stage is preceded by a dimension reduction stage using PCA.
The dimension of the subspace transformed by PCA is chosen
such as the ‘‘reduced’’ within scatter matrix in the subspace is
nonsingular, so that classical LDA can be applied. A limitation is that
the optimal value of the reduced dimension for PCA is difficult to
determine. Moreover, the PCA stage may lose some useful
information for discrimination. Howland and Park [22] solved
the singularity problem of LDA by using Generalized Singular
Value Decomposition (GSVD). GSVD aims to find the optimal
transformation W
opt
, which can preserve the dimension of the
spaces spanned by the centroids in the original and transformed
spaces. The drawback is the optimal solutions are obtained by
applying the SVD decomposition of the data matrix, which is
computationally expensive in both time and memory for high
dimensional large scale data sets. Ye [23] extended such
approach by solving the optimization problem using
simultaneous diagonalization of the scatter matrices.
Another limitation of LDA is that it works under the condition that
the sample vectors of each class are generated from underlying
multivariate normal distributions of common covariance matrix but
different means [24]. Hence, if the data of a class are multimodal, LDA
will not generally work. It may even collapse the data samples of the
different class into a single cluster. Over the years, authors have
defined several extensions to the basic formulation of LDA. One such
method is to use a weighted version of LDA, such as the approximate
Pairwise Accuracy Criterion (aPAC) [25] or Penalized DA (PDA) [26].In
this method, weights are introduced in the definition of the metrics,
which can reduce (or penalize) the role of the least stable features and
thus it can make the metrics of DA a bit more flexible. He et al. [27]
proposed Locality Preserving Projection (LPP) method that seeks for
an embedding transformation such that nearby data pairs in the
original space close in the embedding space. Thus, LPP can reduce
the dimensionality of multimodal data without losing the local
structure. Zhu et al. [28] proposed Subclass Discriminant Analysis
(SDA) method, which aims to adapt to a large variety of data
distributions. In this method, multimodal data can be divided into
a set of subclasses whose representation can be used to adapt to
different types of class distributions. We proposed Neighborhood
Preserving Discriminant Analysis (NPDA) method [29],whichcan
maximize the between-class separability while preserve the within-
class local structure. However, NPDA is still affected with the SSS
problem.
The points below highlight the contributions of this paper:
1. A new Augmented DT-CWT (ADT-CWT) method is presented to
extract multi-scale facial features. In this approach, a new
mapping function is defined and used to emphasize those
features having higher statistical probabilities and spatial
importance for face images. After this nonlinear mapping, the
transformed features could have a higher discriminating power.
2. A new dimensionality reduction method is presented called the
Regularized Neighborhood Projection Discriminant Analysis
(RNPDA). In this method, linear projective functions can be
obtained directly using a simple regression framework. Tradi-
tional eigen-problem computation is not involved in our
approach and thus it can avoid SSS problem.
3. Extensive experiments have been made to compare the face
recognition performance of the proposed method with some
popular dimensionality reduction methods on the FERET data-
base [30], the extended YALEB database, [31] and the CMU PIE
database [32]. The results verify the effectiveness of our method.
This paper is organized as follows. ADT-CWT and RNPDA are
introduced in Sections 2 and 3, respectively. Experimental results are
presented in Section 4. Finally, conclusions are drawn in Section 5.
2. Augmented Dual-Tree Complex Wavelet Transform
In this section, we firstly make a brief review of DT-CWT. Then
we present ADT-CWT method which fully considers the statistical
property of the input features and the spatial information about
human faces. For convenience, we present in Table 1 the important
notations used in the rest of the paper.
2.1. Dual-Tree Complex Wavelet Transform
In DT-CWT, two real discrete wavelet transforms
c
h
ðtÞ and
c
g
ðtÞ
are employed in parallel to generate the real and imaginary parts of
H. Hu / Pattern Recognition 44 (2011) 519–531520