非线性成分分析：核特征值问题

需积分: 9 8 浏览量更新于2024-07-31 收藏 589KB PDF 举报

"非线性组件分析作为核特征值问题" 非线性组件分析，通常被称为非线性主成分分析（Nonlinear Principal Component Analysis, NLPCA），是一种在数据中挖掘结构的有效方法，尤其适用于高维数据集。传统的主成分分析（PCA）通过解决特征值问题或使用迭代算法来实现，它在低维空间中找到数据的主要方向，从而降低数据的复杂性，同时保持大部分信息。然而，PCA的一个局限性是它仅适用于线性变换，对于非线性关系的数据，PCA可能无法捕捉到所有重要的模式。 "Nonlinear Component Analysis as a Kernel Eigenvalue Problem"这篇论文提出了一种新的方法，将非线性主成分分析与核方法相结合。通过使用积分算子核函数，可以在高维特征空间中高效地计算主成分，这些特征空间与输入空间之间由某种非线性映射关联，比如在16x16图像中所有可能的五像素产品空间。这种方法的关键在于利用核技巧将非线性问题转化为线性问题，即在高维的“特征空间”而非原始输入空间内进行操作。核方法的核心是核函数，它可以将数据映射到一个内在的高维空间，使得在这个空间中的相似度可以反映原始空间中的非线性关系。例如，多项式核函数可以用于处理数据的二次或更高次交互，这对于模式识别等任务非常有用。通过在特征空间中计算特征值，可以找到数据的主要成分，即使这个空间的维度远高于原始数据的维度。论文中，作者伯恩哈德·施洛克夫（Bernhard Scholkopf）、亚历山大·斯莫拉（Alexander Smola）和克劳斯-罗伯特·穆勒（Klaus-Robert Müller）展示了这种方法的理论推导，并提供了在多项式特征提取上的实验结果，以证明其在模式识别中的有效性。实验结果通常会对比传统PCA和其他非线性降维技术，展示NLPCA在保留数据关键信息和提高识别性能方面的优势。非线性主成分分析的这种方法不仅拓宽了PCA的应用范围，也使得处理复杂、非线性数据成为可能，对于图像处理、模式识别、生物信息学等领域具有重大意义。通过这种方式，研究人员和数据科学家能够更好地理解和解析复杂数据集的内在结构，从而提升模型的预测能力和解释性。

Nonlinear Component Analysis 1303

Before we proceed to the next section, which more closely investigates

the role of the map 8, the following observation is essential: 8 can be an

arbitrary nonlinear map into the possibly high-dimensional space F, for ex-

ample, the space of all dth order monomials in the entries of an input vector.

Inthatcase,weneedtocomputedotproductsofinputvectorsmappedby8,

at a possibly prohibitive computational cost. The solution to this problem,

described in the following section, builds on the fact that we exclusively

need to compute dot products between mapped patterns (in equations 2.10

and 2.15); we never need the mapped patterns explicitly.

3 Computing Dot Products in Feature Spaces

In order to compute dot products of the form (8(x) · 8(y)), we use kernel

representations,

k(x, y) = (8(x) · 8(y)), (3.1)

which allow us to compute the value of the dot productin F without having

tocarry outthemap8.ThismethodwasusedbyBoseretal.(1992) toextend

the Generalized Portrait hyperplane classiﬁer of Vapnik and Chervonenkis

(1974) to nonlinear support vector machines. To this end, they substitute a

priorichosenkernelfunctionskforalloccurrencesofdotproducts,obtaining

decision functions

f(x) = sgn

i=1

k(x, x

) + b

. (3.2)

Aizerman et al. (1964) call F the linearization space, and use it in the context

of the potential function classiﬁcation method to express the dot product

between elements of F in terms of elements of the input space. If F is high-

dimensional, we would like tobe able to ﬁnd a closed-form expression for k

that can be efﬁciently computed. Aizerman et al. (1964) consider the possi-

bility of choosing k a priori, without being directly concerned with the cor-

responding mapping 8 into F. A speciﬁc choice of k might then correspond

to a dot product between patterns mapped with a suitable 8. Aparticularly

usefulexample,which isadirectgeneralizationofaresultprovedbyPoggio

(1975, lemma 2.1) in the context of polynomial approximation, is

(x · y)





j=1

· y





,...,j

· ...· x

· y

· ...· y

= (C

(x) · C

(y)), (3.3)

剩余20页未读，继续阅读

Captainzhao

粉丝: 1
资源: 3

非线性成分分析：核特征值问题

CMatrix Class

基于核主元提取的支持向量机辨识

核主元分析KPCA的降维特征提取以及故障检测应用-Kernel Principal Component Analysis (KPCA).zip

关于组织参加“第八届‘泰迪杯’数据挖掘挑战赛”的通知-4页

PyMySQL-1.1.0rc1.tar.gz

技术资料分享CC2530中文数据手册完全版非常好的技术资料.zip

docker构建php开发环境

VB程序实例59_系统信息_显示分辨率.zip

pytz-2016.7-py2.6.egg

VB程序实例-为程序添加快捷键.zip

最新资源