learning. Thus they proposed a regularized convex formulation to
learn the relationships between different tasks, where the pro-
posed formulation was viewed as one novel generalization for
single-task learning.
2.2. The family of HK algorithms
2.2.1. MHKS
The original HK algorithm was expected to obtain a good
classification performance. But HK was sensitive to outliers [16].
In order to solve this problem, Leski proposed a modified HK
algorithm named MHKS [16]. MHKS bases on the regularized least
squares and tries to maximize the separating margin [27–29]. To
be more specific, MHKS gives its separating hyperplane as
follows:
YwZ 1
N1
: ð1Þ
Consequently, the criterion function of MHKS is changed as
min
o
A
R
d þ 1
, b Z 0
Lðw, bÞ¼ðYw1
N1
bÞ
T
ðYw1
N1
bÞþc
~
w
T
~
w, ð2Þ
where c Z 0 is the regularized hyper-parameter that adjusts the
tradeoff between the model complexity and the training error.
The procedure of MHKS is almost the same as that of the original
HK classifier. The difference between MHKS and HK is that the
argument weight vector w
k
in MHKS becomes
w
k
¼ðY
T
Y þc
~
IÞ
1
Y
T
ðb
k
þ1
N1
Þ, ð3Þ
where
~
I is an identity matrix with the last element on the main
diagonal set to zero.
2.2.2. MatMHKS
Since vector representation for patterns fails in some image-
based learning, some matrix-based algorithms were proposed in
terms of both feature extraction [30–32] and classifier design [9].
MatMHKS was a typical matrixized classifier and could directly
classify patterns represented with matrix. As a consequence,
MatMHKS was viewed as a matrixized improvement of MHKS.
In the matrix case, suppose that there is a binary-class classifica-
tion problem with N matrix samples ðA
i
,
j
i
Þ, i ¼ 1 ...N, where
A
i
A
R
mn
and its corresponding class label
j
i
A fþ1, 1g. The
decision function of MatMHKS for the binary problem is given as
gðA
i
Þ¼u
T
A
i
~
v
4 0, if
j
i
¼þ1
o 0, if
j
i
¼1
(
, ð4Þ
where both uA
R
m
and
~
v A
R
n
are the weight vectors. The
corresponding optimization function of MatMHKS is defined as
follows:
min
u A
R
m
,
~
v A
R
n
, v
0
, b Z 0
Jðu,
~
v, v
0
, bÞ¼
X
N
i ¼ 1
ð
j
i
ðu
T
A
i
~
v þv
0
Þ1b
i
Þ
2
þcðu
T
S
1
uþ
~
v
T
S
2
~
vÞ, ð5Þ
where S
1
¼ mI
mm
, S
2
¼ nI
nn
are the two regularized matrices
corresponding to the weight vectors u and
~
v respectively, and the
regularized parameter c (cA
R
, c Z 0) controls the generalization
ability of the classifier through making a tradeoff between the
classifier complexity and the training error. The vectors u,
~
v,andthe
bias v
0
can be obtained by the gradient optimization of the
formulation (5) with respect to u,
~
v,andv
0
respectively. The detailed
processing optimization can be referred in the literature [9].
2.2.3. MultiV-MHKS
In the literature [8], MHKS was supposed to be a single-view
classifier and could be multiviewized into multiple matrixized
MatMHKS. Then we adopted a joint learning for different
MatMHKSs and proposed a multi-view learning machine Mul-
tiV-MHKS. In mathematics, suppose that there is an original
vector pattern x
i
A
R
d
. The x
i
can be represented by different
matrices A
q
i
A
R
m
q
n
q
, q ¼ 1 ...M, where d is equal to m
q
n
q
.
In MultiV-MHKS, we set Y
q
¼½y
q
1
, ..., y
q
N
T
, y
q
i
¼
j
i
½u
qT
A
q
i
, 1
T
,
i ¼ 1 ...N, b
q
¼½b
q
1
, ...b
q
N
T
, v
q
¼½
~
v
qT
, v
q
0
T
, where the q denotes
the index number of the view in MultiV-MHKS. Then the criterion
function of MultiV-MHKS is given as follows:
min
u
q
A
R
m
q
,v
q
A
R
n
q
þ 1
q ¼ 1, ..., M
J
0
¼
X
M
q ¼ 1
ððY
q
v
q
1
N1
b
q
Þ
T
ðY
q
v
q
1
N1
b
q
Þ
þc
q
ðu
q
T
S
1
u
q
þv
q
T
~
S
2
v
q
ÞÞ
þ
g
X
M
q ¼ 1
Y
q
v
q
1
M
X
M
p ¼ 1
ðY
p
v
p
Þ
!
T
Y
q
v
q
1
M
X
M
p ¼ 1
ðY
p
v
p
Þ
!
, ð6Þ
where S
1
¼ m
q
I
m
q
m
q
, S
2
¼ n
q
I
n
q
n
q
,
~
S
2
¼ð
S
2
0
0
0
Þ is a matrix with a
dimensionality of ðn
q
þ1Þðn
q
þ1Þ, c
q
is the regularized para-
meter for each view, and the
g
is the coupling parameter. In the
formulation (6), the weight value of each view is simply set to
1=M. In this case, each MatMHKS plays an equal role in the whole
classification. Then for optimizing the criterion function (6), we
make the gradient of J
0
with respect to both u
q
and v
q
be zero.
Therefore we can get the following optimal results:
u
q
¼ 1þ
g
M 1
M
2
!
X
N
i ¼ 1
A
q
i
~
v
q
ðA
q
i
~
v
q
Þ
T
þc
q
S
1
!
1
X
N
i ¼ 1
A
q
i
~
v
q
j
i
ðb
q
i
þ1Þ
1þ
g
M 1
M
2
!
v
q
0
þ
g
M 1
M
2
X
N1
p ¼ 1, p a q
ðu
pT
A
p
i
~
v
p
þv
p
0
Þ
!!
, ð7Þ
v
q
¼ 1þ
g
M 1
M
2
!
Y
qT
Y
q
þc
q
~
S
2
!
1
Y
qT
1
N1
þb
q
þ
g
M1
M
2
X
M
p ¼ 1, p a q
Y
p
v
p
!
: ð8Þ
The iteration for both u
q
and v
q
is the same as that in MatMHKS.
Since MultiV-MHKS is a joint learning for multiple views, its
decision function integrates multiple MatMHKSs and is given as
follows:
gðzÞ¼
1
M
X
M
q ¼ 1
ðu
qT
Z
q
~
v
q
þv
q
0
Þ
4 0 then zA classþ1
o 0 then zA class1
(
, ð9Þ
where z is the test sample and Z
q
is the qth matrix representation
for the z.
3. Proposed regularized multi-view learning machine
(RMultiV-MHKS)
MultiV-MHKS was expected to make a full use of the advan-
tage of different matrix representations. But the equal value with
1=M was a simple setting for the weight of each MatMHKS in
MultiV-MHKS, which might be not sensible in some real-world
cases. For example, one certain matrix representation supplied
less even no useful information for discrimination, while the
decision function (9) still took the less useful matrix representa-
tion into the final classification like the other useful ones. It urges
us to assign different weights to the matrix representation with
different matrix representations. In order to realize such an
Z. Wang et al. / Neurocomputing 97 (2012) 201–213 203