
Pattern Recognition 39 (2006) 1002 – 1006
www.elsevier.com/locate/patcog
Rapid and brief communication
Why direct LDA is not equivalent to LDA
Hui Gao
∗
, James W. Davis
Computer Vision Laboratory, Department of Computer Science and Engineering, The Ohio State University, 395 Dreese Lab,
2015 Neil Avenue, Columbus, OH 43210, USA
Received 26 August 2005; accepted 25 November 2005
Abstract
In this paper, we present counterarguments against the direct LDA algorithm (D-LDA), which was previously claimed to be equivalent
to Linear Discriminant Analysis (LDA). We show from Bayesian decision theory that D-LDA is actually a special case of LDA by directly
taking the linear space of class means as the LDA solution. The pooled covariance estimate is completely ignored. Furthermore, we
demonstrate that D-LDA is not equivalent to traditional subspace-based LDA in dealing with the Small Sample Size problem. As a result,
D-LDA may impose a significant performance limitation in general applications.
䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Linear discriminant analysis; Direct LDA; Small sample size problem
1. Introduction
Recently, an algorithm called direct Linear Discriminant
Analysis (D-LDA) has received considerable interest in Pat-
tern Recognition and Computer Vision. It was first proposed
in Ref. [1] to deal with the small sample size (SSS) problem
in face recognition and has been followed with several ex-
tensions, e.g., fractional direct LDA [2], kernel based direct
LDA [3], and regularized direct discriminant analysis [4].
The key idea in this method is that the null space of the
between-class scatter matrix S
b
contains no useful infor-
mation for recognition and is discarded by diagonalization.
The within-class scatter matrix S
w
is then projected into the
linear subspace of S
b
and factorized using eigenanalysis to
obtain the solution. It was claimed in Ref. [1] that
(1) D-LDA gives the “exact solution for Fisher’s criterion”.
(2) D-LDA is equivalent to subspace-based LDA (e.g.,
PCA + LDA) in dealing with the SSS problem.
However, we observe that these claims of D-LDA are
flawed in theory. Although the null components of S
b
do not
∗
Corresponding author. Tel.: +1 614 247 6095; fax: +1 614 292 2911.
E-mail address: gaoh@cse.ohio-state.edu (H. Gao).
0031-3203/$30.00
䉷
2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2005.11.016
influence the projection of S
b
in the feature space, they do
influence the projection of S
w
and hence should not be dis-
carded. Since all “direct” approaches share the same idea
(e.g. Refs. [1–4]), we focus on the original work of D-LDA
[1] to simplify the discussion. Similar arguments can be
made to any of the extensions.
Our analysis originates from the viewpoint of Bayesian
decision theory. It is well-known [5] that Fisher’s LDA (ratio
of S
b
and S
w
in the projection space) is equivalent to a
classification problem of c Gaussians with equal covariance
when the model parameters are estimated in the maximum-
likelihood (ML) fashion. The solution requires a minimum
of c − 1 linear features (assuming input dimension D?c)to
form a sufficient statistic. However in D-LDA, because the
null space of S
b
is first discarded, its solution is constrained
to be in the linear space of S
b
(no matter the form of S
w
),
which is maximally c − 1 dimensional. Hence, the complete
c − 1 dimensional linear space of S
b
must be kept as the
D-LDA solution in order for it to possibly be a sufficient
statistic. Due to the fact of ignoring S
w
, D-LDA is a special
case of LDA.
We additionally point out one missing assumption in
the linear algebra derivation of D-LDA given in Ref.
[1]. When any singular matrix (S
b
or S
w
) is involved in
the generalized eigenvector and eigenvalue problem, the