KPLS子空间与FNN结合的非线性变量选择方法

58 浏览量更新于2024-08-26 收藏 785KB PDF 举报

"KPLS子空间中基于虚假最近邻的非线性变量选择研究" 本文主要探讨了在非线性分类问题中如何有效地进行变量选择，以提高模型的性能和解释性。研究者提出了一种新颖的方法，该方法结合了局部偏最小二乘（Kernel Partial Least Squares, KPLS）子空间与虚假最近邻（False Nearest Neighbours, FNN）的概念，用于识别和剔除不重要的输入变量。 KPLS 是一种非线性降维技术，它通过内核映射将原始数据转换到高维特征空间，然后在该空间中执行部分最小二乘回归，从而捕获数据的非线性关系。这种方法能够处理非线性复杂的数据模式，但同时也会导致变量数量的增加，增加了模型的复杂性和计算负担。因此，变量选择在这个过程中显得尤为重要。 FNN 方法通常用于检测数据点在高维空间中的局部结构，它通过检查数据点与其最近邻之间的距离变化来识别潜在的近似最近邻。在KPLS子空间中应用FNN，可以度量变量的重要性。如果一个变量对数据点在子空间中的邻域结构影响不大，那么它可能被视为不重要的变量，可以被剔除。研究者首先将非线性输入数据转换为KPLS子空间的主要成分，这些主要成分代表了数据的主要变异性。然后，通过FNN的距离度量来确定各个变量的重要性顺序。在这一过程中，那些使得数据点与其最近邻之间距离显著增大的变量被视为较不重要的，因为它们可能不贡献于模型的预测能力。通过这种方法，可以识别并去除对模型贡献较小的变量，实现变量的简约化。实验部分，作者针对三个典型的分类问题，使用不同参数模型进行了变量选择的研究。结果表明，结合KPLS子空间和FNN的变量选择方法对于非线性模型的约简具有良好的效果。这意味着这种方法可以有效减少非线性系统中的输入变量数量，同时保持模型的预测性能。这项工作为非线性分类中的变量选择提供了一个创新且实用的工具，有助于提高模型的效率和解释性。通过利用KPLS的非线性表示能力和FNN对局部结构的敏感性，研究人员可以更好地理解数据的本质，并构建更加精简且强大的非线性模型。这种方法的应用不仅限于学术研究，也对实际工程问题，如生物信息学、信号处理和模式识别等领域有广泛的应用前景。

Study on Nonlinear Variable Selection based on False Nearest Neighbours

in KPLS Subspace

Yingying Su,

Shan Liang,

Cheng Zeng,

Kesheng Yan,

Jun Peng

1, 3

College of Automation, Chongqing University, Chongqing, China,

yy_su2000@yahoo.com.cn, zengcheng1290@163.com

College of Automation, Chongqing University, Chongqing, China, lightsun@cqu.edu.cn

School of Mathematics and Statistics, Chongqing University of Technology, Chongqing,

China, yan_kesheng@126.com

1, 5

Dept. of Electric and Electronic Information Engineering, Chongqing Univ. of Sci. & Tech.,

Chongqing, China, pengjun70@126.com

Abstract

Variable selection is one of the most significant model selecting problems in classification. A novel

approach based on False Nearest Neighbours (FNN) in Kernel Partial Least Squares (KPLS) subspace

is proposed to select the parsimonious variables as nonlinear modeling inputs. Firstly, the nonlinear

inputs were simplified into the principal components in KPLS subspace. Furthermore, the sequence of

their importance was made according to the distance measure inspired by FNN in KPLS subspace. In

this way, the unimportant variables were recognized. Finally, the variable selections of 3 typical

classification problems were studied with the different parametric models. The results show that the

method is valid and effective for nonlinear model reduction. Therefore, it can be utilized for the

variable selection of nonlinear systems.

Keywords: Kernel Partial Least Squares, False Nearest Neighbours, Variable Selection, Nonlinear

1. Introduction

Regression/classification tasks involve mapping n-dimensional continuous inputs

onto a m-

dimensional output vector

. The problem of variable selection is often referred to as the problem of

subset selection of inputs

, it arises when one wants to model the relationship between a variable of

response

and a subset of potential explanatory variables or predictors

, but there is uncertainty

about which subset to use [1-3]. The simple way is to consider all the subsets of variables with the

combination 2n, which is optimized but exhaustive. While, in fact the suboptimal model is enough,

especially when the model faces the curse of dimension and time constraint.

At first, the Forward, the Backward, and the Stepwise Methods are widely used for its simple but

complexly computing in nonlinear variable selection [4-6]. Then many selecting criterions are

proposed, e.g. the Distance Measure, Topology Measure, and Dependence Measure [7-9]. Although

they expends the solving method, but faces the problem of the different measure used, the selection

solution varies. Afterwards, the heuristic method has once been involved for variable selection, e.g.

Genetic Algorithm and Simulated Annealing, which are short of guide [10]. With the development of

Manifold Learning, there exists nonlinear dimension reduction method, e.g. Self-organizing Feature

Mapping, Principal Curves, and Generative Topographic Mapping [11-13]. Meanwhile, with the kernel

function emerging, many linear variable selection methods have been extended to nonlinear ones,

including some nonlinear feature extracting methods, e.g. Kernel Partial Least Squares, Kernel

Principal Component Analysis, and Kernel Independent Component Analysis [14-15]. These methods

are useful in feature extraction and information compression, which exactly gets the reduced matrix

with some projections from original variables to feature subspace. In fact, they can’t distinguish the

correlated variables in primal space, rather than delete the redundant input variables. Therefore, a

variable selection method is needed, which can directly reduce the number of redundant inputs and

take the accuracy changing with the addition/deletion of inputs into account.

Study on Nonlinear Variable Selection based on False Nearest Neighbours in KPLS Subspace

Yingying Su, Shan Liang, Cheng Zeng, Kesheng Yan,Jun Peng

International Journal of Advancements in Computing Technology(IJACT)

Volume4, Number18,October. 2012

doi: 10.4156/ijact.vol4.issue 18.38

324

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38637580

粉丝: 3
资源: 917

KPLS子空间与FNN结合的非线性变量选择方法

KPLS软传感器提升蒸发器出口溶液浓度精确测量

基于KPLS的图像超分辨率增强算法：理论与实验应用

KPLS-FDA混合方法: 过程监控与质量预测的创新策略

KPLS程序_matlab_kPLS_

基于混合KPLS-FDA的过程监控和质量预报方法

基于T-KPLS的重介质选煤过程运行状态评价

基于KPLS鲁棒重构误差的高炉燃料比监测与异常识别.docx

基于KPLS和TD的热连轧精轧末架微调AGC (2008年)

KPLS.rar_kpls matlab code_matlab KPLS_matlab MBKP_最小二乘迭代_核函数

KPLS模型Matlab程序包：数据包含，即刻运行

最新资源