Heteroscedastic Max-min Distance Analysis
Bing Su
1
, Xiaoqing Ding
1
, Changsong Liu
1
,YingWu
2
1
Tsinghua University, Beijing, 100084, China
subingats@gmail.com,{dxq,lcs}@ocrserv.ee.tsinghua.edu.cn
2
Northwestern University, Evanston, IL, 60208, USA
yingwu@eecs.northwestern.edu
Abstract
Many discriminant analysis methods such as LDA and
HLDA actually maximize the average pairwise distances
between classes, which often causes the class separation
problem. Max-min distance analysis (MMDA) address-
es this problem by maximizing the minimum pairwise dis-
tance in the latent subspace, but it is developed under
the homoscedastic assumption. This paper proposes Het-
eroscedastic MMDA (HMMDA) methods that explore the
discriminative information in the difference of intra-class s-
catters for dimensionality reduction. WHMMDA maximizes
the minimal pairwise Chenoff distance in the whitened s-
pace. OHMMDA incorporates this objective and the min-
imization of class compactness into a trace quotient for-
mulation and imposes an orthogonal constraint to the final
transformation, which can be solved by a bisection search
algorithm. Two variants of OHMMDA are further pro-
posed to encode the margin information. Experiments on
several UCI Machine Learning datasets and the Yale Face
database demonstrate the effectiveness of the proposed H-
MMDA methods.
1. Introduction
Dimensionality reduction (DR) has become a ubiquitous
procedure in many pattern recognition and machine learn-
ing applications. Among a number of DR approaches, a
class of linear supervised techniques referred to as
discrim-
inant analysis (DA)
has received a lot of attention, which
maximizes the separability of classes. Various criteria have
been proposed based on different definitions of separabil-
ity in the literature. Linear discriminant analysis (LDA) is
probably the most widely used method, which was first pro-
posed for two-class problems by Fisher in [6] and extended
to general multi-class problems by Rao in [16]. LDA opti-
mizes the so-called Fisher criterion by maximizing the ratio
of between-class scatter over within-class scatter under the
homoscedastic Gaussian assumption. The Fisher criterion
is extended to handle tensor data in [27] and sequence data
in [19].
However, in practice, the distributions of classes are of-
ten non-Gaussian, and the covariances of different classes
are not equal. These situations have been studied in exten-
sive literatures. Subclass discriminant analysis [24, 30] con-
siders the general distribution types by dividing each class
into several subclasses each described by one Gaussian dis-
tribution. Marginal Fisher Analysis [25] uses marginal and
neighboring points to construct inter-class separability and
intra-class compactness. Locality preserving property is
combined with LDA in [10] to handle multimodal data.
The optimal class representation is determined in [20] to
replace the mean class vector. The Bayes error is direct-
ly minimized in [7]. The heteroscedastic Gaussian model
parameters are jointly estimated with DR in the maximum-
likelihood framework in [11]. Heteroscedastic LDA (HL-
DA) [12] extends LDA to heteroscedastic cases by utiliz-
ing the Chernoff criterion instead of the Fisher criterion,
where the Chernoff distance is employed to generalize the
between-class scatter. A theoretical analysis of HLDA is
presented in [15]. In [17], the Chernoff distance is maxi-
mized in the transformed space by a gradient-based algo-
rithm, and its performance compared with LDA and HLDA
is evaluated in [1].
These methods actually maximize the average of al-
l pairwise distances between classes due to the definition of
between-class scatter. This will cause the so-called “class
separation” problem [13]. Specifically, these methods tend
to pay close attention to classes with larger distances, but ig-
nore those with smaller distances, resulting in the overlap of
“neighbouring” classes in the projected subspace on the ba-
sis of a specific distance measure. An example to illustrate
the class separation problem is shown in Fig. 1(a), where
class 1 and class 2 locate closely to each other while class
3 is far away from them. All classes have the same unit co-
variance. (In this case, HLDA degenerates into LDA.) The
average of pairwise distances between classes is maximized
when the dominant large distances between class 3 and the
4539978-1-4673-6964-0/15/$31.00 ©2015 IEEE