Eigenvalues Perturbation of Integral Operator
for Kernel Selection
Yong Liu Shali Jiang Shizhong Liao
∗
School of Computer Science and Technology
Tianjin University, Tianjin 300072, P. R. China
szliao@tju.edu.cn
ABSTRACT
Kernel selection is one of the key issues both in recent re-
search and application of kernel methods. This is usually
done by minimizing either an estimate of generalization error
or some other related performance measure. It is well known
that a kernel matrix can be interpreted as an empirical ver-
sion of a continuous integral operator, and its eigenvalues
converge to the eigenvalues of integral op erator. In this pa-
per, we introduce new kernel selection criteria based on the
eigenvalues perturbation of the integral operator. This per-
turbation quantifies the difference between the eigenvalues
of the kernel matrix and those of the integral operator. We
establish the connection between eigenvalues perturbation
and generalization error. By minimizing the derived gener-
alization error bounds, we propose the kernel selection cri-
teria. Therefore the kernel chosen by our proposed criteria
can guarantee good generalization performance. To compute
the values of our criteria, we present a method to obtain the
eigenvalues of integral operator via the Fourier transform.
Experiments on benchmark datasets demonstrate that our
kernel selection criteria are sound and effective.
Categories and Subject Descriptors
I.2.6 [Artificial Intelligence]: Learning—Parameter Learn-
ing; I.5.2 [Pattern Recognition]: Design Methodology—
Classifier Design and Evaluation; H.2.8 [Database Man-
agement]: Database Applications—Data Mining
General Terms
Algorithms, Theory, Experimentation
Keywords
Kernel Selection, Eigenvalues Perturbation, Integral Opera-
tor, Generalization Error.
∗
corresponding author
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
CIKM’13, October 27–November 01, 2013, San Francisco, CA, USA
Copyright 2013 ACM 978-1-4503-2263-8/13/10 ...$15.00.
http://dx.doi.org/10.1145/2505515.2505584.
1. INTRODUCTION
Kernel methods [33, 29, 10, 30, 31] have been widely used
in pattern recognition and machine learning. Because the
performance of kernel methods greatly depends on the choice
of the kernel function, the kernel selection becomes an im-
portant topic in kernel methods. A related problem is the
evaluation of the generalization ability of learning algorithm-
s. In fact, it is common to select the optimal kernel function
by choosing the one with the lowest generalization error.
Obviously, the generalization error is not directly com-
putable, as the probability distribution generating the data
is unknown, therefore it is necessary to resort to estimates
of its value. The generalization error can be estimated ei-
ther via theoretical bounds or testing on some unused data
(hold-out testing or cross validation). To estimate the upper
bounds of the generalization error, some complexity mea-
sures are introduced: such as VC dimension [33], Rademach-
er complexity [3], maximal discrepancy [2], regularized risk
[29], radius-margin bound [33] and compression coefficien-
t [24]. However, for most of these complexity measures,
proposed to derive theoretical generalization error bounds,
it is difficult to compute their values [25, 26], which make
them hard to be used for kernel selection in practice. Min-
imizing the empirical estimate of the generalization error is
an alternative to kernel selection. K-fold cross-validation
(KCV) and leave-one-out cross-validation (LOO) [9, 23] are
two popular empirical estimates. Although KCV and LOO
are widely used in many fields, they have their dark sides:
(a) the overall learning problem may over-fitting the cross-
validation error [6, 7]; (b) high computational cost. For the
sake of efficiency, some approximate KCV and LOO criteria
are given: such as generalized cross-validation (GCV)[19],
generalized comparative Kullback-Liebler distance (GCKL)
[34], generalized approximate cross-validation (GACV) [35],
span bound [8, 9] and influence function [15].
Based on the similarity, Cristianini et al. [14] present a
kernel selection criterion called kernel target alignment (K-
TA). Nguyen and Ho [25, 26] point out several drawbacks
of the KTA, and propose a surrogate measure (called FSM)
to evaluate the goodness of a kernel function via the data
distribution in the feature space. Similar to KTA, Cortes et
al. [12] present a centered kernel target alignment criteri-
on (CKTA) with a centered kernel matrix. Although KTA,
CKTA and FSM are widely used, the connection between
these criteria and generalization error for specific learning
algorithms has not been established, so the kernels chosen
by these criteria may not guarantee good generalization per-
formance.