Study on Nonlinear Variable Selection based on False Nearest Neighbours
in KPLS Subspace
1
Yingying Su,
*2
Shan Liang,
3
Cheng Zeng,
4
Kesheng Yan,
5
Jun Peng
1, 3
College of Automation, Chongqing University, Chongqing, China,
yy_su2000@yahoo.com.cn, zengcheng1290@163.com
*2
College of Automation, Chongqing University, Chongqing, China, lightsun@cqu.edu.cn
4
School of Mathematics and Statistics, Chongqing University of Technology, Chongqing,
China, yan_kesheng@126.com
1, 5
Dept. of Electric and Electronic Information Engineering, Chongqing Univ. of Sci. & Tech.,
Chongqing, China, pengjun70@126.com
Abstract
Variable selection is one of the most significant model selecting problems in classification. A novel
approach based on False Nearest Neighbours (FNN) in Kernel Partial Least Squares (KPLS) subspace
is proposed to select the parsimonious variables as nonlinear modeling inputs. Firstly, the nonlinear
inputs were simplified into the principal components in KPLS subspace. Furthermore, the sequence of
their importance was made according to the distance measure inspired by FNN in KPLS subspace. In
this way, the unimportant variables were recognized. Finally, the variable selections of 3 typical
classification problems were studied with the different parametric models. The results show that the
method is valid and effective for nonlinear model reduction. Therefore, it can be utilized for the
variable selection of nonlinear systems.
Keywords: Kernel Partial Least Squares, False Nearest Neighbours, Variable Selection, Nonlinear
1. Introduction
Regression/classification tasks involve mapping n-dimensional continuous inputs
X
onto a m-
dimensional output vector
Y
. The problem of variable selection is often referred to as the problem of
subset selection of inputs
X
, it arises when one wants to model the relationship between a variable of
response
Y
and a subset of potential explanatory variables or predictors
X
, but there is uncertainty
about which subset to use [1-3]. The simple way is to consider all the subsets of variables with the
combination 2n, which is optimized but exhaustive. While, in fact the suboptimal model is enough,
especially when the model faces the curse of dimension and time constraint.
At first, the Forward, the Backward, and the Stepwise Methods are widely used for its simple but
complexly computing in nonlinear variable selection [4-6]. Then many selecting criterions are
proposed, e.g. the Distance Measure, Topology Measure, and Dependence Measure [7-9]. Although
they expends the solving method, but faces the problem of the different measure used, the selection
solution varies. Afterwards, the heuristic method has once been involved for variable selection, e.g.
Genetic Algorithm and Simulated Annealing, which are short of guide [10]. With the development of
Manifold Learning, there exists nonlinear dimension reduction method, e.g. Self-organizing Feature
Mapping, Principal Curves, and Generative Topographic Mapping [11-13]. Meanwhile, with the kernel
function emerging, many linear variable selection methods have been extended to nonlinear ones,
including some nonlinear feature extracting methods, e.g. Kernel Partial Least Squares, Kernel
Principal Component Analysis, and Kernel Independent Component Analysis [14-15]. These methods
are useful in feature extraction and information compression, which exactly gets the reduced matrix
with some projections from original variables to feature subspace. In fact, they can’t distinguish the
correlated variables in primal space, rather than delete the redundant input variables. Therefore, a
variable selection method is needed, which can directly reduce the number of redundant inputs and
take the accuracy changing with the addition/deletion of inputs into account.
Study on Nonlinear Variable Selection based on False Nearest Neighbours in KPLS Subspace
Yingying Su, Shan Liang, Cheng Zeng, Kesheng Yan,Jun Peng
International Journal of Advancements in Computing Technology(IJACT)
Volume4, Number18,October. 2012
doi: 10.4156/ijact.vol4.issue 18.38