SVRML：支持向量回归的Bagging式度量学习

PDF格式 | 1010KB | 更新于2024-08-26 | 5 浏览量 | 举报

"这篇研究论文探讨了一种支持向量回归(SVR)的新型度量学习方法，该方法类似于Bagging集成学习策略。作者包括Peng-Cheng Zou、Jiandong Wang、Songcan Chen和Haiyan Chen，他们都来自南京航空航天大学计算机科学与技术学院。论文在2014年4月19日在线发布，主要关键词涉及距离度量学习、支持向量回归、集成学习和Bagging以及基于距离的核函数。" 正文: 支持向量回归(Support Vector Regression, SVR)是一种广泛应用于回归分析的机器学习技术，它通过找到最大边距超平面来拟合数据，以此实现对连续变量的预测。然而，SVR通常依赖于预定义的距离度量，如欧氏距离，这可能不总是适用于所有任务，因为它与数据本身的特性无关。论文指出，任务相关的度量学习能从给定数据中学习到更适应任务的度量，从而提高学习性能。因此，作者提出了一种新的度量学习算法，名为SVRML (Support Vector Regression Metric Learning)，该算法专门为支持向量回归设计，目的是优化数据的嵌入空间，以提升回归的准确性。 Bagging（Bootstrap Aggregating）是一种集成学习方法，通过从原始数据集中多次抽样构建多个模型，然后将它们的结果综合起来，以减少模型的方差并提高稳定性。在论文中，"Bagging-like" 指的是借鉴了Bagging的思想，可能是在多个不同的度量或子空间上进行SVR训练，然后结合这些模型的预测结果，以增强整体回归性能。此外，论文还讨论了距离度量学习的重要性，特别是在基于距离的核函数中。距离度量学习旨在找到一个合适的距离函数，使同类样本之间的距离小，不同类样本之间的距离大。这种学习方法对于那些依赖于距离计算的任务，如k-最近邻分类和k-means聚类，具有显著影响。这篇研究论文提出了一个创新的方法，将度量学习的概念与支持向量回归相结合，并借鉴了Bagging的并行训练和组合思想，以提升回归模型的泛化能力和鲁棒性。这种方法不仅丰富了支持向量回归的理论框架，也为解决实际问题提供了新的工具和策略。

Bagging-like metric learning for support vector regression

Peng-Cheng Zou

⇑

, Jiandong Wang, Songcan Chen

, Haiyan Chen

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

article info

Article history:

Received 26 December 2013

Received in revised form 28 February 2014

Accepted 2 April 2014

Available online 19 April 2014

Keywords:

Distance metric learning

Support vector regression

Ensemble learning

Bagging

Distance-based kernel

abstract

Metric plays an important role in machine learning and pattern recognition. Though many available off-

the-shelf metrics can be selected to achieve some learning tasks at hand such as for k-nearest neighbor

classiﬁcation and k-means clustering, such a selection is not necessarily always appropriate due to its

independence on data itself. It has been proved that a task-dependent metric learned from the given data

can yield more beneﬁcial learning performance. Inspired by such success, we focus on learning an embed-

ded metric specially for support vector regression and present a corresponding learning algorithm

termed as SVRML, which both minimizes the error on the validation dataset and simultaneously enforces

the sparsity on the learned metric matrix. Further taking the learned metric (positive semi-deﬁnite

matrix) as a base learner, we develop a bagging-like effective ensemble metric learning framework in

which the resampling mechanism of original bagging is specially modiﬁed for SVRML. Experiments on

various datasets demonstrate that our method outperforms the single and bagging-based ensemble

metric learnings for support vector regression.

1. Introduction

Metric learning plays an important role in many learning tasks

including k-nearest neighbor classiﬁcation, k-means clustering and

kernel-based algorithms such as support vector machines [1–5].In

recent years, many studies have demonstrated empirically and

theoretically that it is often beneﬁcial for a learning task to learn

a metric from the given data, instead of using an off-the-shelf

one such as Euclidean distance metric.

Depending on the availability of the given data, these methods

roughly fall into two main categories: unsupervised metric learn-

ing and supervised metric learning. Each unsupervised metric

learning method is essentially to learn a distance metric without

supervised information [6,7]. While in supervised metric learning,

more information about data such as label information is used to

learn the metric and it is better to capture the idiosyncrasies of

the data of interest [8,9]. We pay particular attention to the super-

vised methods in this paper.

Supervised distance metric learning can be further divided into

task-independent and task-dependent metric learnings. The task-

independent methods usually include two separated learning

steps: in the ﬁrst step, a metric is learned by solving an optimiza-

tion problem with the supervised information. Then the second

step uses the learned metric to solve a subsequent task. The classi-

cal Linear Discriminant Analysis (LDA) though as a dimensionality

reduction method can also be viewed as a pseudo-metric learning

method [10]. The metric learned by LDA can be used in many

subsequent tasks such as k-nearest neighbor classiﬁcation. In

addition, MMC by Xing et al. learns a metric by minimizing the

distances in equivalence constraints and maximizing the distances

in inequivalence constraints. Then the metric learned by MMC is

used in different clustering tasks [1].

Though the task-independent methods have used the

supervised information when learning the metrics, such a two step

method cannot guarantee the learned metric is optimal for the

subsequent task. Therefore, a more desirable method is to learn

the metric directly via incorporating the speciﬁc subsequent task,

just as the task-dependent distance metric learning. It is similar

to the feature selection problem that embedding methods can

usually achieve better performance than ﬁlter methods [11]. The

task-independent metric learning is corresponding to the ﬁlter

method and the task-dependent metric learning is corresponding

to the embedding method. One of the most representative works

is Large Margin Nearest Neighbor (LMNN) [2], in which the learned

metric is tailored specially for k-nearest neighbor classiﬁcation

and leads to signiﬁcant improvement compared to k-nn with

task-independent metrics. Several related researches have also

been proposed, such as Neighborhood components analysis

(NCA) [4], multi-task LMNN [12] and Non-linear LMNN [13], etc.

It should be noted that most of the existing task-dependent

metric learning methods are designed for classiﬁcation tasks

especially k-nn. Similar to classiﬁcation, regression is another

important task in machine learning and its performance is also

http://dx.doi.org/10.1016/j.knosys.2014.04.002

⇑

Corresponding authors. Tel.: +86 15850685790.

E-mail addresses: zou_pc@163.com (P.-C. Zou), s.chen@nuaa.edu.cn (S. Chen).

Knowledge-Based Systems 65 (2014) 21–30

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier.com/locate/knosys

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38640117

粉丝: 1

SVRML：支持向量回归的Bagging式度量学习

熵度量特征选择的支持向量机集成增强泛化性能

支持向量机：模式识别的利器

机器学习与深度学习自测题精选：面试必考知识点

逻辑回归， k-近邻算法，朴素贝叶斯分类器， 支持向量机，决策树，组合分类器，聚类基础，机器学习基础实验，含原理精华、基础实验 关注微信公众号：分享之心，后台回复“机器学习基础实验”获取代码和文档链接

这是一个关于房屋出租价格预测的竞赛题，属于机器学习算法中的回归问题.zip

支持向量回归（SVR）入门：预测连续变量的高超技巧

【与逻辑回归比较】：SVM支持向量机与逻辑回归的性能对比分析

【支持向量机新视角】：Weka房价回归预测的高效技巧

支持向量机在数据挖掘中的应用：优化技巧与实战案例

支持向量机的可解释性：为什么模型可解释性至关重要？

最新资源

逻辑回归， k-近邻算法，朴素贝叶斯分类器，支持向量机，决策树，组合分类器，聚类基础，机器学习基础实验，含原理精华、基础实验关注微信公众号：分享之心，后台回复“机器学习基础实验”获取代码和文档链接