变分相关向量机：稀疏贝叶斯中的概率预测与支持向量机对比

需积分: 16 166 浏览量更新于2024-09-18 收藏 181KB PDF 举报

变分相关向量机（Variational Relevance Vector Machine, VRVM）是一种基于支持向量机（Support Vector Machine, SVM）的改进模型，首次由Christopher M. Bishop 和 Michael E. Tipping 在2000年的Uncertainty in Artificial Intelligence proceedings中提出。原始的SVM因其强大的分类能力而广受欢迎，其工作原理是通过线性组合的核函数，仅依赖于训练数据中的部分样本（支持向量）来做出预测。然而，SVM的一个主要局限在于它只提供点估计，无法生成预测分布，这对于许多实际应用来说是不足的。 Tipping的RVM是对这一问题的一种解决尝试，它引入了概率建模的思想，使得预测不再是确定性的。RVM将决策边界与统计学中的潜在变量联系起来，每个输入特征都关联一个潜在的“relevance”或重要性参数。这些参数通过贝叶斯框架处理，允许模型生成预测的不确定性，并且在稀疏数据条件下表现良好，因为它倾向于学习少数最重要的特征，减少了过拟合的风险。 VRVM的关键特点是其变分推断方法。在维纳过程的启发下，通过优化一个与证据下最大似然估计相关的上界，变分推断能够找到最接近真实后验分布的近似分布。这种方法使得VRVM在保持与SVM相似的识别精度的同时，提供了预测置信度的估计，这对于诸如图像分类、文本挖掘等需要处理不确定性和复杂性任务的应用非常有用。此外，与传统的硬间隔SVM不同，VRVM采用的是软间隔，允许数据点在某种程度上落在类别之间，这增加了模型的灵活性。在实际应用中，变分相关向量机的优势体现在处理大型高维数据集、噪声数据以及需要概率解释的场景，同时还能有效地控制模型复杂度。总结来说，变分相关向量机是一种结合了支持向量机优点和概率模型灵活性的机器学习算法，它通过变分推断技术提供了预测的不确定性信息，克服了SVM的固有局限，为现代人工智能领域特别是稀疏贝叶斯学习中的模式识别任务开辟了新的可能性。

46 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000

Variational Relevance Vector Machines

Christopher M. Bishop Michael E. Tipping

Microsoft Research

7 J. J. Thompson Avenue, Cambridge CB3 0FB, U.K.

{cmbishop,mtipping}@microsoft.com

http://research.microsoft.com/{∼cmbishop,∼mtipping}

In Uncertainty in Artiﬁcial Intelligence 2000, C. Boutilier and M. Goldszmidt (Eds), 46–53, Morgan Kaufmann.

Abstract

The Support Vector Machine (SVM) of Vap-

nik [9] has become widely established as one

of the leading approaches to pattern recogni-

tion and machine learning. It expresses pre-

dictions in terms of a linear combination of

kernel functions centred on a subset of the

training data, known as support vectors.

Despite its widespread success, the SVM suf-

fers from some important limitations, one

of the most signiﬁcant being that it makes

point predictions rather than generating pre-

dictive distributions. Recently Tipping [8]

has formulated the Relevance Vector Ma-

chine (RVM), a probabilistic model whose

functional form is equivalent to the SVM. It

achieves comparable recognition accuracy to

the SVM, yet provides a full predictive distri-

bution, and also requires substantially fewer

kernel functions.

The original treatment of the RVM re-

lied on the use of type II maximum like-

lihood (the ‘evidence framework’) to pro-

vide point estimates of the hyperparameters

which govern model sparsity. In this paper

we show how the RVM can be formulated

and solved within a completely Bayesian

paradigm through the use of variational in-

ference, thereby giving a posterior distribu-

tion over both parameters and hyperparam-

eters. We demonstrate the practicality and

performance of the variational RVM using

both synthetic and real world examples.

1 RELEVANCE VECTORS

Many problems in machine learning fall under the

heading of supervized learning, in which we are given a

set of input vectors X = {x

}

n=1

together with corre-

sponding target values T = {t

}

n=1

. The goal is to use

this training data, together with any pertinent prior

knowledge, to make predictions of t for new values of

x. We can distinguish two distinct cases: regression,

in which t is a continuous variable, and classiﬁcation,

in which t belongs to a discrete set.

Here we consider models in which the prediction

y(x, w) is expressed as a linear combination of basis

functions φ

(x) of the form

y(x, w) =

m=0

(x) = w

φ (1)

where the {w

} are the parameters of the model, and

are generally called weights.

One of the most popular approaches to machine learn-

ing to emerge in recent years is the Support Vector Ma-

chine (SVM) of Vapnik [9]. The SVM uses a particular

specialization of (1) in which the basis functions take

the form of kernel functions, one for each data point

in the training set, so that φ

(x) = K(x, x

where K(·, ·) is the kernel function. The framework

which we develop in this paper is much more general

and applies to any model of the form (1). However, in

order to facilitate direct comparisions with the SVM,

we focus primarily on the use of kernels as the basis

functions.

Point estimates for the weights are determined in the

SVM by optimization of a criterion which simultane-

ously attempts to ﬁt the training data while at the

same time minimizing the ‘complexity’ of the function

y(x, w). The result is that some proportion of the

weights are set to zero, leading to a sparse model in

which predictions, governed by (1), depend only on a

subset of the kernel functions.

下载后可阅读完整内容，剩余7页未读，立即下载

yangfanlingling

粉丝: 12

变分相关向量机：稀疏贝叶斯中的概率预测与支持向量机对比

变分贝叶斯相关向量机在稀疏编码中的应用研究

MATLAB实现变分贝叶斯稀疏编码的向量机研究

变分模态分解与相关向量机在风功率短期预测的应用研究

Minty向量似变分不等式与向量优化问题 (2008年)

行业分类-设备装置-基于变分模态分解和相关向量机的风功率区间短期预测方法.zip

黎曼流形上的向量似变分不等式与向量优化问题 (2009年)

SABO-VMD-SVM减法平均优化变分模态分解支持向量机故障诊断（Matlab实现完整源码和数据）

变分模态分解与蝙蝠算法-相关向量机在短期风速区间预测中的应用

基于变分模态分解（VMD）与麻雀优化算法（SSA）改进最小二乘支持向量机（LSSVM）的短期电力负荷精准预测模型 ,（MATLAB)基于变分模态分解与麻雀优化最小二乘支持向量机的短期电力负荷预测（VM

基于变分模态分解与灰狼算法优化的支持向量机时间序列预测系统：Excel数据导入，代码清晰，适合初学者，支持多种算法优化与分解预测 ,基于变分模态分解与灰狼算法优化的支持向量机时间序列预测：Excel数

最新资源