异方差最大-最小距离分析：提升分类性能的维度降维方法

121 浏览量更新于2024-08-28 收藏 346KB PDF 举报

异方差最大-最小距离分析（Heteroscedastic Max-min Distance Analysis, HMMDA）是一种针对高维数据降维和分类问题的新型方法，特别针对传统线性判别分析（Linear Discriminant Analysis, LDA）和混合高斯判别分析（Hierarchical Linear Discriminant Analysis, HLDA）中存在的异方差问题。这些方法通常倾向于最大化类间平均距离，导致了类间分离问题。在标准的异方差假设下，最大最小距离分析（MMDA）通过最大化潜在子空间中的最小对间距离来解决这个问题。然而，MMDA并未充分考虑实际数据中的异方差性。 HMMDA方法的主要贡献在于它扩展了MMDA的理念，不再局限于同方差假设。它探索的是通过比较同一类别内部样本之间的差异来提取更丰富的区分信息。具体来说，HMMDA在白化空间中最大化的是最小的Chenoff距离，这是一种衡量两个分布差异度的统计量，有助于保持类间的区分度。为了进一步优化降维后的模型，OHMMDA（Orthogonal Heteroscedastic MMDA）在此基础上引入了一个新的目标：同时最小化类内紧凑度，这有助于保持数据的结构。OHMMDA将这种优化目标与一个迹商公式相结合，并对最终变换矩阵施加正交约束。正交约束确保了变换的稳健性和解释性，而迹商则在处理复杂数据时提供了稳定性。解决OHMMDA的优化问题采用了一种二分搜索算法，这种方法在求解过程中逐步逼近全局最优解，即使在存在异方差的情况下也能找到有效的降维和分类策略。HMMDA和OHMMDA方法不仅提高了分类性能，还能处理数据中的异方差性，这对于许多实际应用中的数据分析具有重要意义，例如生物信息学、计算机视觉和模式识别等领域。通过对比传统方法，HMMDA和OHMMDA展示了在处理异方差数据集时的有效性和优越性。

Heteroscedastic Max-min Distance Analysis

Bing Su

, Xiaoqing Ding

, Changsong Liu

,YingWu

Tsinghua University, Beijing, 100084, China

subingats@gmail.com,{dxq,lcs}@ocrserv.ee.tsinghua.edu.cn

Northwestern University, Evanston, IL, 60208, USA

yingwu@eecs.northwestern.edu

Abstract

Many discriminant analysis methods such as LDA and

HLDA actually maximize the average pairwise distances

between classes, which often causes the class separation

problem. Max-min distance analysis (MMDA) address-

es this problem by maximizing the minimum pairwise dis-

tance in the latent subspace, but it is developed under

the homoscedastic assumption. This paper proposes Het-

eroscedastic MMDA (HMMDA) methods that explore the

discriminative information in the difference of intra-class s-

catters for dimensionality reduction. WHMMDA maximizes

the minimal pairwise Chenoff distance in the whitened s-

pace. OHMMDA incorporates this objective and the min-

imization of class compactness into a trace quotient for-

mulation and imposes an orthogonal constraint to the ﬁnal

transformation, which can be solved by a bisection search

algorithm. Two variants of OHMMDA are further pro-

posed to encode the margin information. Experiments on

several UCI Machine Learning datasets and the Yale Face

database demonstrate the effectiveness of the proposed H-

MMDA methods.

1. Introduction

Dimensionality reduction (DR) has become a ubiquitous

procedure in many pattern recognition and machine learn-

ing applications. Among a number of DR approaches, a

class of linear supervised techniques referred to as

discrim-

inant analysis (DA)

has received a lot of attention, which

maximizes the separability of classes. Various criteria have

been proposed based on different deﬁnitions of separabil-

ity in the literature. Linear discriminant analysis (LDA) is

probably the most widely used method, which was ﬁrst pro-

posed for two-class problems by Fisher in [6] and extended

to general multi-class problems by Rao in [16]. LDA opti-

mizes the so-called Fisher criterion by maximizing the ratio

of between-class scatter over within-class scatter under the

homoscedastic Gaussian assumption. The Fisher criterion

is extended to handle tensor data in [27] and sequence data

in [19].

However, in practice, the distributions of classes are of-

ten non-Gaussian, and the covariances of different classes

are not equal. These situations have been studied in exten-

sive literatures. Subclass discriminant analysis [24, 30] con-

siders the general distribution types by dividing each class

into several subclasses each described by one Gaussian dis-

tribution. Marginal Fisher Analysis [25] uses marginal and

neighboring points to construct inter-class separability and

intra-class compactness. Locality preserving property is

combined with LDA in [10] to handle multimodal data.

The optimal class representation is determined in [20] to

replace the mean class vector. The Bayes error is direct-

ly minimized in [7]. The heteroscedastic Gaussian model

parameters are jointly estimated with DR in the maximum-

likelihood framework in [11]. Heteroscedastic LDA (HL-

DA) [12] extends LDA to heteroscedastic cases by utiliz-

ing the Chernoff criterion instead of the Fisher criterion,

where the Chernoff distance is employed to generalize the

between-class scatter. A theoretical analysis of HLDA is

presented in [15]. In [17], the Chernoff distance is maxi-

mized in the transformed space by a gradient-based algo-

rithm, and its performance compared with LDA and HLDA

is evaluated in [1].

These methods actually maximize the average of al-

l pairwise distances between classes due to the deﬁnition of

between-class scatter. This will cause the so-called “class

separation” problem [13]. Speciﬁcally, these methods tend

to pay close attention to classes with larger distances, but ig-

nore those with smaller distances, resulting in the overlap of

“neighbouring” classes in the projected subspace on the ba-

sis of a speciﬁc distance measure. An example to illustrate

the class separation problem is shown in Fig. 1(a), where

class 1 and class 2 locate closely to each other while class

3 is far away from them. All classes have the same unit co-

variance. (In this case, HLDA degenerates into LDA.) The

average of pairwise distances between classes is maximized

when the dominant large distances between class 3 and the

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38688745

粉丝: 4
资源: 908

异方差最大-最小距离分析：提升分类性能的维度降维方法

2维异方差鉴别分析及其在人脸识别中的应用.pdf

5 降维方法1

异方差Breusch-Pagan检验，

stata 异方差检验

grilic的异方差

sas异方差检验与加权最小二乘

spss回归分析异方差的修正

残差分析和异方差检验的关系

当计算得到的方差为-2.899523120108139e-15，可以将其令为1e-8吗？

stata异方差修正命令

最新资源