Nonparametric Bayesian Multi-Task
Large-margin Classification
Changying Du
1,2
, Jia He
1
, Fuzhen Zhuang
1
,YuanQi
3
,QingHe
1
Abstract. In this paper, we present a nonparametric
Bayesian multi-task large-margin classification model which
can cluster tasks into the most appropriate number of groups
and induce flexible model sharing within each task group si-
multaneously. Specifically, we first show a very simple method
to integrate large margin learning with hierarchical Bayesian
models by employing an important variant of the standard
SVM, i.e., proximal SVM (PSVM), whose loss function is
used to define a novel likelihood function. And then we as-
sume that the model parameter of each task consists of two
parts: one is shared within each task group (group-level pa-
rameter) while the other is specific to each distinct task (task
rescaling parameter). A Dirichlet process prior is imposed on
the group-level parameter while the task rescaling parameter
is assigned a one-mean Laplace prior. Finally the parameter of
a task is the corresponding group parameter times its specif-
ic rescaling parameter. We give efficient Markov chain Monte
Calo (MCMC) algorithm to conduct model inference. Exper-
iments on the Landmine detection data and the UCI Yeast
data demonstrate the effectiveness of our method.
1 INTRODUCTION
Machine learning lies in the heart of artificial intelligence,
and has been extensively studied during the past decades.
While traditional machine learning is approaching to its po-
tential performance limit, a new learning scenario called mul-
titask learning (MTL) [6] has attracted more and more atten-
tion in the community of machine learning and data mining
[25, 2, 7, 26, 8, 15, 9, 21]. Multitask learning learns multiple
related tasks together so as to improve the performance of
each task relative to learning them separately. Over the past
decade, MTL has been successfully applied to many importan-
t areas including computer vision [24, 15], natural language
processing [1], bioinformatics [20, 26] and landmine detection
[25, 14].
It has been shown that the performance boosting merit of
MTL is mainly due to its information sharing among tasks,
which is the key aspect in the design of MTL algorithms.
To uncover latent task structure and alleviate harmful infor-
mation sharing, task-grouping is a common practice in MTL
[3, 25, 15, 16, 19]. Existing methods typically assume tasks
1
Key Lab of Intelligent Information Processing of Chinese Acade-
my of Sciences (CAS), Institute of Computing Technology, CAS,
Beijing 100190, China
2
University of Chinese Academy of Sciences, Beijing 100049, Chi-
na, email: ducy@ics.ict.ac.cn
3
Departments of CS and Statistics, Purdue University, IN, USA
in the same cluster share the same model [3, 25], though it is
more reasonable to allow some flexibility in each task group.
Meanwhile, large-margin classification models such as SVMs
stand for the most popular classification models in tradition-
al learning scenarios, but there are still not many successful
multi-task large-margin classification models, especially those
with the capability to find latent task groups automatically.
In this paper, we present a nonparametric Bayesian multi-
task large-margin classification model which can cluster tasks
into the most appropriate number of groups and induce flex-
ible model sharing within each group simultaneously. Specif-
ically, we first show a very simple method to integrate large
margin learning with hierarchical Bayesian models by employ-
ing an important variant of the standard SVM, i.e., proximal
SVM (PSVM) [11], whose empirical loss function can be used
to define a novel likelihood function. And then we assume
that the model parameter of each task consists of two parts:
one is shared within each task group (group-level parameter)
while the other is specific to each distinct task (task rescaling
parameter). A Dirichlet process (DP) [10, 23] prior is imposed
on the group-level parameter while each dimension of the task
rescaling parameter is assumed to have a one-mean Laplace
prior. Due to the nonparametric clustering nature of DP, we
can automatically cluster the tasks into separate groups with-
out pre-specifying the group number, which is hard to deter-
mine in advance. In each group all tasks share the same group-
level parameter while each task has its own small task-specific
rescaling over the group parameter. By imposing a one-mean
laplace prior, the rescaling is sparse, and finally the parameter
of a task is the group parameter times its specific rescaling
parameter. This corresponds to that in each task group, for
most dimensions the multidimensional models are identical,
but for special ones they may differ from each other, which is
a flexible model sharing scheme.
We give efficient Markov chain Monte Calo (MCMC) algo-
rithm to conduct model inference. Experiments on the Land-
mine detection data set and the UCI Yeast data set demon-
strate our method can not only outperform state-of-the-art
MTL algorithms but also discover the task-clustering struc-
ture very well.
The remainder is organized as follows. Section 2 briefly cov-
ers the necessary preliminaries. Then in Section 3 we propose
our nonparametric Bayesian multi-task large-margin classifi-
cation model by first defining a novel likelihood function. The
experimental results are demonstrated in Section 4 and re-
lated works are given in Section 5. Finally we conclude the
paper in Section 6.
ECAI 2014
T. Schaub et al. (Eds.)
© 2014 The Authors and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License.
doi:10.3233/978-1-61499-419-0-255
255