自适应学习最优邻域核方法

25 浏览量更新于2024-08-27 收藏 992KB PDF 举报

"本文探讨了一种自适应学习最优邻域核的方法，即An Adaptive Approach to Learning Optimal Neighborhood Kernels，该方法在基于核的方法中对学习最优核起着关键作用。文章发表在IEEETRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013期。" 在机器学习领域，选择一个合适的核函数对于基于核的算法（如支持向量机SVM）的性能至关重要。最近，提出了一个称为最优邻域核学习(ONKL)的方法，它显示出优秀的分类效果。ONKL假设最优的核函数应该存在于一个预先“指定”的核函数的邻域内。然而，如何合理地指定这个预设核函数仍然是一个未解决的问题。为了解决这个问题，这篇研究论文将预设的核函数视为一个额外的变量，并与最优邻域核以及支持向量机的结构参数一起进行联合学习。为了防止出现简单的解决方案，作者们提出对预设核函数施加参数化模型的约束。论文首先讨论了这种方法的特点，特别是其自适应性。接着，文章展示了两种具体实现的实例，这表明该方法能够在不断变化的数据环境中自动调整，以适应最佳的核函数。通过这种自适应学习策略，不仅可以找到更优的核函数，而且能够提高模型的泛化能力和适应能力。这在处理非线性可分或复杂数据集时尤其有用，因为这些数据集可能需要特定的核函数来捕获其内在结构。此外，该方法的灵活性允许它适应不同类型的数据和任务，而无需手动调整参数或选择特定的核函数。这项工作为机器学习中的核函数选择提供了一个新的视角，即通过自适应学习来优化核函数，而不是依赖于固定的或预先设定的核函数。这不仅简化了学习过程，也提高了算法的性能。在实际应用中，这种自适应的核学习方法可以广泛应用于图像识别、自然语言处理、生物信息学等领域，帮助提升模型的准确性和效率。

LIU et al.: ADAPTIVE APPROACH TO LEARNING OPTIMAL NEIGHBORHOOD KERNELS 373

[29], [30] in the third category learns G in the neighborhood of

by imposing a penalty term G − K



, where ·

de-

notes the Frobenius norm. The optimization problem becomes

min

G0

min

f∈H

f

+ C



i=1

 (y

f(x

)) +

G − K



(4)

where ρ ≥ 0 is a regularization parameter and G  0 is a

positive semi-deﬁnite matrix. As shown in [29], the classiﬁer

f obtained in this way can achieve higher classiﬁcation accu-

racy when compared with the classiﬁer trained with K

.The

performance of ONKL depends on the goodness of the pre-

speciﬁed kernel K

. However, how to identify an appropriate

remains unsolved. Moreover, setting the pre-speciﬁed ker-

nel K

is separated from learning the optimal kernel G in [29].

III. O

UR ADAPTIVE APPROACH TO LEARNING OPTIMAL

NEIGHBORHOOD KERNELS

A. Problem Formulation

To address the above issues, we treat the pre-speciﬁed kernel

as an extra variable and jointly learns it with the optimal kernel

and the structure parameters of SVMs. We call our approach

optimal neighborhood joint kernel learning (ONJKL) in short.

Mathematically, we deﬁne the objective of ONJKL as

min

G,K

f

+ C



i=1

 (y

f(x

)) +

G − K



s.t. K

∈ Ω, G  0,f∈H

, ν ∈ R

. (5)

Note that to make this approach work, we constrain the pre-

speciﬁed kernel with a parameterized model and denote it by

∈ Ω, where ν is the parameter vector and Ω is its domain,

a subset of all positive semi-deﬁnite matrices. Applying such

a model is necessary because 1) constraining the pre-speciﬁed

kernel within an appropriate domain avoids the trivial solution

of K

≡ G, which always minimizes the last term in (5);

2) a parameterized model allows prior knowledge on the pre-

speciﬁed kernel to be conveniently incorporated. Given differ-

ent models, the above optimization will produce different K

and G to ﬁt classiﬁcation tasks.

We consider our approach as an adaptive way for ONKL and

highlight the adaptivity of our approach as follows:

• It adaptively sets the pre-speciﬁed kernel for a given

classiﬁcation task via optimization;

• In this approach, the optimal neighborhood kernel and the

pre-speciﬁed kernel are adaptively adjusted with respect to

each other through the joint learning process;

• This approach adaptively produces the optimal neighbor-

hood kernel based on the parameterized model chosen by

auser.

In the following, we analyze the properties of this formula-

tion, provide two instantiations, and discuss how to efﬁciently

solve this optimization.

B. Properties of the Optimal Neighborhood Kernel G

This subsection ﬁrst shows that the optimal neighborhood

kernel G can be obtained once the pre-speciﬁed kernel K

and

the structure parameters of SVMs are optimized. It then proves

that G can always achieve higher kernel-alignment value that

, providing a theoretical support for the better classiﬁcation

performance obtained by using G.

We derive the dual problem of (5) as follows:

min

G,K

max

α∈R



1 −

(α ⊗ y)



G(α ⊗ y)+

G − K



s.t. α



y =0, 0  α  C1,

∈ Ω (6)

where α is the structure parameters of SVMs (or, Lagrange

multipliers), y is a column vector consisting of the labels of

all training samples, and ⊗denotes a component-wise multipli-

cation between two vectors.

Following Theorem 1 in [29], we can easily derive the

analytical form of G which minimizes the (6), as stated in

Theorem 1.

Theorem 1: The optimal neighborhood kernel, G



, that min-

imizes (6) has the following analytical form:



= K

2ρ

(α ⊗ y)(α ⊗ y)



. (7)

The result in Theorem 1 indicates that in our approach, we

only need to optimize the pre-speciﬁed kernel K

(or more

precisely, its model parameters ν ) and the SVMs’ structure

parameters α, instead of optimizing the matrix G.

Now, we evaluate the goodness of G



and K

by using

the kernel alignment criterion [34]. The alignment between

two kernel matrices K

and K

is deﬁned as A(K

, K

K

, K

/



K

, K





K

, K

, where ·, · is the inner

product between two matrices.

Recall that y is the column

vector of the labels of all training samples. In the literature,



is called the ideal kernel for a given classiﬁcation task. The

alignment of a kernel to the ideal one can be used to measure

its goodness. As proven in [34], the estimation of the alignment

is concentrated, which means that if a kernel achieves high

alignment on the training set, it expects to obtain high alignment

on the test set, too. Higher alignment implies that the resulting

classiﬁer will have better generalization performance according

to the Theorem 4 in [34]. Speciﬁcally, the generalization error

of a classiﬁer is upper bounded by 1 −

A(S), where

A(S)

is the kernel alignment on data set S. More details on the

generalization capability are recommended to refer some recent

work [35], [36], where some new reﬁnement techniques which

maximize the uncertainty or combines multiple reducts have

been proposed to improve the generalization capability of a

learning system. For our approach, we can prove the following

theorem.

Theorem 2: The kernel alignment of the optimal neighbor-

hood kernel G



to yy



is higher than that of K

A, B =



,wherea

and b

are the entries of A and B.

剩余13页未读，继续阅读

weixin_38622849

粉丝: 3
资源: 958

自适应学习最优邻域核方法

Cellular automata Model: an Adaptive Approach to Determining the Flow of Tollbooths

deep learning(Adaptive Computation and Machine Learning series)

Synchronization of FitzHugh-Nagumo systems: An adaptive approach

Adaptive XML to relational mapping: an integrated approach

An adaptive clustering approach for groupdetection in the crowd

The Boosting Approach to Machine Learning An Overview

An adaptive large neighborhood search for the two-echelon multip

An_Adaptive_Learning_System_with_Learning_Style_Di_单片机开发_Others_

An_approach_to_adaptive_control_of_fuzzy_dynamic_systems

Sample-adaptive Multiple Kernel Learning

最新资源