高效KPCA算法：基于特征相关性的评估与应用

35 浏览量更新于2024-08-30 1 收藏 942KB PDF 举报

本文是一篇发表在《神经计算与应用》(Neural Computing and Applications, ISSN: 0941-0643)的研究论文，标题为“基于特征相关性评估的高效KPCA算法”。KPCA（Kernel Principal Component Analysis）是一种常用的非线性降维技术，在数据挖掘和机器学习领域中占有重要地位。本文主要贡献是提出了一种新的高效KPCA算法，该算法特别关注特征之间的相关性评估。首先，文章介绍了传统的KPCA方法，它通过构建高维特征空间中的内积核函数来实现非线性映射，从而保留数据的复杂结构。然而，当处理大规模数据或具有高度相关性的特征时，传统KPCA可能会面临计算效率低下的问题。作者意识到，对特征之间的相关性进行有效利用可以优化算法性能。该高效KPCA算法的核心在于特征相关性评估步骤，这可能包括计算特征间的相关系数、识别冗余或弱相关的特征，以及选择一个能最大化数据投影信息保留度的子集。这样做的目的是减少计算量，同时保持或提高模型的预测精度。算法可能采用启发式方法或者统计学方法来量化和选择特征，比如皮尔逊相关系数、斯皮尔曼等级相关等。论文作者Zizhu Fan、Jinghua Wang、Baogen Xu和Pengzhi Tang在2012年12月4日提交了这篇研究，并于后续获得了接受。他们强调，新算法不仅在理论上有创新，而且在实际应用中也展示了优越的性能，尤其是在处理大量相关特征数据集时，能够显著提升KPCA的执行速度。值得注意的是，作者提醒读者，此篇文章受到版权保护，个人使用仅限于非电子存储，并且必须遵守Springer-Vergag的发布规定。在接受的稿件版本上传到个人网站或存储库时，应在12个月后或更晚的时间进行，并确保对原始出版源给出认可，并在Springer的网页链接上添加相应文字声明。这篇论文为改进KPCA算法提供了新的视角，通过考虑特征相关性，优化了计算效率，对于处理大规模和高度相关特征的数据具有实际价值。对于从事机器学习、数据挖掘或计算机视觉领域的研究人员来说，这篇工作提供了有价值的技术参考和潜在的研究方向。

ORIGINAL ARTICLE

An efﬁcient KPCA algorithm based on feature correlation

evaluation

Zizhu Fan

•

Jinghua Wang

•

Baogen Xu

•

Pengzhi Tang

Received: 4 December 2012 / Accepted: 20 April 2013

 Springer-Verlag London 2013

Abstract Classic kernel principal component analysis

(KPCA) is less computationally efﬁcient when extracting

features from large data sets. In this paper, we propose an

algorithm, that is, efﬁcient KPCA (EKPCA), that enhances

the computational efﬁciency of KPCA by using a linear

combination of a small portion of training samples, referred

to as basic patterns, to approximately express the KPCA

feature extractor, that is, the eigenvector of the covariance

matrix in the feature extraction. We show that the feature

correlation (i.e., the correlation between different feature

components) can be evaluated by the cosine distance

between the kernel vectors, which are the column vectors

in the kernel matrix. The proposed algorithm can be easily

implemented. It ﬁrst uses feature correlation evaluation to

determine the basic patterns and then uses these to recon-

struct the KPCA model, perform feature extraction, and

classify the test samples. Since there are usually many

fewer basic patterns than training samples, EKPCA feature

extraction is much more computationally efﬁcient than that

of KPCA. Experimental results on several benchmark data

sets show that EKPCA is much faster than KPCA while

achieving similar classiﬁcation performance.

Keywords Kernel principal component analysis (KPCA) 

Feature extraction  Feature correlation  Cosine distance

1 Introduction

Kernel principal component analysis (KPCA) [1–7]isan

effective approach to feature extraction and has been used

with success in a variety of pattern recognition problems as

well as in numerous image-related machine learning appli-

cations [8–16]. KPCA is a nonlinear generalization of clas-

sical principal component analysis (PCA), which seeks to

extract a certain number of the most representative features

from data [17–20]. Basically, in KPCA, the input data are ﬁrst

transformed into a high or even inﬁnite dimensional feature

space F through a nonlinear mapping, and then classic PCA is

performed in the feature space F. KPCA employs an appro-

priately chosen kernel function to stand for inner products of

sample vectors in the feature space, which means that it does

not need to explicitly carry out the nonlinear mapping [21, 22].

KPCA often outperforms the classical PCA because it can

effectively extract the nonlinear features of a sample, but

KPCA-based feature extraction is computationally expensive

since its computational efﬁciency is inversely proportional to

the number of training samples [23]. Inevitably then, the

greater the number of training samples, the lower the efﬁ-

ciency of feature extraction.

A number of reformulated algorithms [24–32] have

recently been proposed to improve the computational

efﬁciency of KPCA. Rosipal and Girolami [24] proposed to

improve the training efﬁciency of KPCA using an expec-

tation maximization approach. Zheng et al. [25] enhanced

the training efﬁciency of KPCA by grouping the training

samples. Schraudolph et al. [26] improved the convergence

of the kernel Hebbian algorithm (KHA) for iterative kernel

PCA. Moerland described how an expectation–maximiza-

tion algorithm for classical PCA could be adapted to kernel

PCA without having to store the kernel matrix [27]. These

methods, however, do not consider the feature extraction

Z. Fan (&)  B. Xu  P. Tang

School of Basic Science, East China Jiaotong University,

Nanchang 330013, China

e-mail: zzfan3@yahoo.com.cn

J. Wang

Department of Computing, The Hong Kong Polytechnic

University, Kowloon, Hong Kong, China

123

Neural Comput & Applic

DOI 10.1007/s00521-013-1424-9

Author's personal copy

剩余13页未读，继续阅读

weixin_38631042

粉丝: 4

高效KPCA算法：基于特征相关性的评估与应用

数据特征分析：相关性分析（Pandas中的corr方法）

PCA，KPCA LDA算法

Kernel-Principal-Component-Analysis-KPCA_KPCA_KPCAmatlab_主成分分析_源

基于PCA算法的人脸身份识别

并行PCA–KPCA用于非线性过程监控

matlab神经网络和优化算法：18降维与特征选择参考程序.zip

04.zip_KPCA PLS_PLS 故障_PLS 故障检测_pca故障识别_故障识别

基于EMD-KPCA-LSTM的光伏功率预测模型研究

文本特征提取与降维：PCA降维算法在自然语言处理中的应用

统计学中的矩阵分析：掌握协方差矩阵与相关性的3个技巧

最新资源