Intelligent Systems Conference 2017
7-8 September 2017 | London, UK
Generalized Relevance Vector Machine
Yuheng Jia
∗
, Sam Kwong
†
, Wenhui Wu
∗
, Wei Gao
∗
and Ran Wang
‡
∗
Department of Computer Science
City University of Hong Kong, Hong Kong, 3442–9704
Email: yhjia3-c@my.cityu.edu.hk; wenhuiwu3-c@my.cityu.edu.hk; weigao5-c@my.cityu.edu.hk
†
Department of Computer Science
City University of Hong Kong, Hong Kong, 3442–2907
Email: cssamk@cityu.edu.hk
‡
College of Mathematics and Statistics
Shenzhen University, Shenzhen 518060, China
Email: wangran@szu.edu.cn
Abstract—This paper considers the generalized version of rel-
evance vector machine (RVM), which is a sparse Bayesian kernel
machine for classification and ordinary regression. Generalized
RVM (GRVM) follows the work of generalized linear model
(GLM), which is a natural generalization of ordinary linear
regression model and shares a common approach to estimate
the parameters. GRVM inherits the advantages of GLM, i.e.,
unified model structure, same training algorithm, and convenient
task-specific model design. It also inherits the advantages of
RVM, i.e., probabilistic output, extremely sparse solution, hyper-
parameter auto-estimation. Besides, GRVM extends RVM to a
wider range of learning tasks beyond classification and ordinary
regression by assuming that the conditional output belongs to
exponential family distribution (EFD). Since EFD results in
inference intractable problem in Bayesian analysis, in this paper,
Laplace approximation is adopted to solve this problem, which is
a common approach in Bayesian inference. Further, several task-
specific models are designed based on GRVM including models
for ordinary regression, count data regression, classification,
ordinal regression, etc. Besides, the relationship between GRVM
and traditional RVM models are discussed. Finally, experimental
results show the efficiency of the proposed GRVM model.
Keywords—Relevance vector machine; Generalized linear mod-
els; Laplace approximation; Bayesian analysis; Exponential family
distribution.
I. INTRODUCTION
Generalized linear models (GLM) [1] is a class of models
that is a natural generalization of ordinary linear regression
(OLR) model. GLMs include OLR, logistic regression, linear
count data regression, linear ordinal regression, etc. The “gen-
eralized” term in the title of this paper has the same meaning
as the “generalized” term in GLM.
From statistical perspective, the conditional output distri-
bution of OLR belongs to Gaussian distribution, i.e.,
p(y
∗
|x
∗
) = N(β
T
x
∗
, σ
2
) =
1
√
2πσ
2
exp(−
1
σ
2
(y
∗
−β
T
x
∗
)
2
)
(1)
where x ∈ R
M
is the input vector, y
∗
is the predictive output
for input vector x
∗
, β ∈ R
M
is the OLR model parameter,
σ
2
is the variance of noise and N stands for Gaussian
distribution. GLM generalizes the OLR by modifying Gaussian
distribution to exponential family distribution (EFD) [2], [3],
[4], [5]. EFD is a class of distributions in the exponential form
including several common distributions, i.e., Gaussian distri-
bution, Poisson distribution, Bernoulli distribution, Binomial
distribution, Gamma distribution, etc. By specifying EFD to
certain distribution, a number of different linear models can be
obtained under GLM. For example, if the conditional output
distribution is Bernoulli distribution, then GLM turns out to
be logistic regression model and if the conditional output is
Poisson distribution, then GLM turns out to be linear count
data regression model. Considering those linear models from
a unified perspective, GLM has been found useful in statistical
analysis with several advantages:
• Unified model structure for each model in GLM,
i.e., output of each model is based on the linear
combination of input vector and model parameter,
such as the β
T
x term in Eq. (1).
• Same learning algorithm for all models in GLM,
which means that parameter for all the models under
GLM can be estimated by the same learning algo-
rithm. That shows the beauty of math in machine
learning.
• Due to the unified model structure and identical learn-
ing algorithm, the design for specify-task model is
very efficient. For example, if the output data is count
data, linear Poisson regression model can be designed
to model the data.
Because of these advantages, GLM has attracted more and
more attentions. Generalized kernel machines (GKM) [2], as
the kernel version of GLM, was proposed to enhance the non-
linear modeling power of GLM. Bayesian generalized kernel
models (BGKM) [5] is the fully Bayesian extension of GLM
in feature space induced by a reproducing kernel. Generalized
Gaussian process models (GGPM) [3] is the generalized model
of Gaussian process (GP) [6], which encompasses many exist-
ing GP models. Since the inference in GGPM is intractable,
Taylor approximation was used for inference in [3]. Besides,
variational inference was also adopted to solve the inference
intractable problem in GGPM, which leads to a sparse solution
[7].
In this paper, a kind of generalized relevance vector ma-
chine (GRVM) is proposed. Relevance vector machine (RVM)
[8], [9] is a kind of sparse Bayesian kernel machine, which can
IEEE 1 | P a g e