Y. Yang et al. / Neurocomputing 257 (2017) 193–205 195
concepts: evidential clustering and constraint-based clustering. EV-
CLUS uses the Dempster–Shafer theory to assign a mass function to
each instance. It provides a credal partition, which subsumes the
notions of crisp, fuzzy and possibilistic partitions. CEVCLUS con-
sists of taking advantage of prior information. Such background
knowledge is integrated as an additional term in the cost function.
Given dataset X = [ x
i
]
n
i =1
and number of clusters K , CEVCLUS aims
to obtain a clustering partition by minimizing a loss function con-
sisted of two terms:
L
C EVC LU S
(M, a, b) = L
EV CLUS
(M, a, b) + α
1
2(| ML | + | CL | )
L
CONST
(M)
(2)
where L
EVCLUS
( M, a, b ) represents the loss of conventional evidential
clustering (EVCLUS), and is expressed as:
L
EV CLUS
(M, a, b) =
1
i< j
d
ij
i< j
aC F
ij
+ b + d
ij
d
ij
(3)
In Eq. (3) the matrix of mass functions M = ( m
ik
) corresponds
to a credal partition. The mass m
ik
represents the degree of belief
that the instance x
i
is assigned to the cluster C
k
, which is regarded
by the distance between x
i
and representative point mean ( C
k
) of
cluster C
k
. C F
ij
= 1 − p l
i × j
(θ ) represents the degree of conflict be-
tween two instances. pl
i × j
( θ) is the plausibility that the instance
x
i
and x
j
are in the same cluster and p l
i × j
(
¯
θ ) is the plausibility
that the two instances are in a different cluster. a and b are two
coefficients.
In the second term of Eq. (2) α
1
2( | ML | + | CL | )
L
CONST
(M) repre-
sents the loss of violating pairwise constraints, α ≥ 0 is a hyper-
parameter that controls the importance of constraints including
Must-Link(ML) and Cannot-Link(CL). Therefore, L
CONST
( M ) can be
formulated as follows:
L
CONST
(M) =
x
i
, x
j
∈ ML
p l
i × j
¯
θ
+ 1 − p l
i × j
(
θ
)
+
x
i
, x
j
∈ CL
p l
i × j
(
θ
)
+ 1 − p l
i × j
(
¯
θ ) (4)
2.4. Constrained clustering via spectral regularization (CCSR)
In this approach [25] , a regularization framework has been de-
veloped for spectral clustering [26] in use of constraint informa-
tion. It aims to adapt the spectral embedding to accord with pair-
wise constraints by optimization, which constructs a smooth data
representation space parameterized by a spectral embedding, and
is most consistent with the pairwise constraints.
Given dataset X = [ x
i
]
n
i =1
and number of clusters K , it initially
forms a sparse symmetric similarity matrix W = [ w
ij
]. where w
ij
de-
notes the similarity between x
i
and x
j
. W is assumed to be non-
negative and symmetric. Then the normalized graph Laplacian is
obtained by
¯
L
= I − D
−1 / 2
W D
−1 / 2
, where I is the identity matrix of
size n × n and D = diag( d
1
, ..., d
n
) with d
i
=
n
j=1
w
ij
. Next, the
m eigenvectors v
1
, …, v
m
of
¯
L
is computed corresponding to the
first m smallest eigenvalues. Given F
m
= ( v
1
, …, v
m
), the new data
representation is written by F = F
m
A , it indicates that obtaining a
representation in consistent with the prior information of pairwise
constraints is equivalent to determining a suitable coefficient ma-
trix A . In fact, coefficient matrix A can be obtained by minimizing
the following loss function:
L (A ) =
(i, j, t
ij
) ∈ S
u
T
i
A A
t
u
j
− t
ij
2
(5)
where u
T
i
denotes the i th row of F
m
, F
m
= ( u
1
, …, u
m
)
T
, S =
{ (i, j, t
ij
) } is a set of pairwise constraints, where t
ij
is a binary in-
dicator that takes 1 or 0 to indicate x
i
and x
j
belong to the same
cluster or not. Let M = A A
t
, it is positive semi-definite. Then the
objective function in Eq. (5) can be re-written as:
L (M) =
(i, j, t
ij
) ∈ S
u
T
i
M u
j
− t
ij
2
(6)
M can be obtained by solving the semi-definite problem derived
from Eq. (6) such detail is shown in [25] . Once M is obtained, the
clustering partition can be finally produced by applying any con-
ventional clustering algorithm such as K-means on the new data
representation that is constructed by rows of F
m
M
1/2
.
2.5. Semi-supervised Naïve Bayesian classifier using EM
In this approach, semi-supervised learning task is performed by
applying naive Bayes to the case of labeled and unlabeled data
with Expectation-Maximization (EM) technique [27] . Firstly a naive
Bayes classifier with parameters of generative model θ is built in
the standard supervised fashion with the limited amount of la-
beled data x
j
∈ X
L
:
ˆ
θ = arg max
θ
P
(
X
L
|
θ
)
=
x
j
∈ X
L
i
P
(
c
i
|
θ
)
P
x
j
|
c
i
; θ
(7)
Then, we perform E-step of Expectation-Maximization (EM)
technique by classifying the unlabeled data x
j
∈ X
U
with the naive
Bayes model
ˆ
θ. The class of unlabeled data is determined by
estimating the highest probability of that the generative model
component generates the corresponding unlabeled data, which is
shown in Eq. (8) :
c
i
= arg max
c
i
P ( c
i
| x
j
;
ˆ
θ ) =
P ( c
i
|
ˆ
θ ) P ( x
j
| c
i
;
ˆ
θ )
P ( x
j
|
ˆ
θ )
(8)
As M-step of Expectation-Maximization (EM) technique, a new
naive Bayes classifier is built by re-estimating the parameters
ˆ
θ
with the updated labeled dataset in Eq. (7) .
The parameters of classifier
ˆ
θ is iteratively improved by repeat-
ing above process of classifying the unlabeled data and rebuilding
the naive Bayes model based on the labeled data until it converges
to a stable classifier and set of labels for the data. In fact, the con-
vergence condition is indicated by no change in the log probability
of full dataset X with label Y in Eq. (9) :
l( θ | X, Y ) = log ( P ( θ ) ) +
x
j
∈ X
U
log
i
P ( c
i
| θ ) P ( x | c
i
; θ )
+
x
j
∈ X
L
log P ( y
j
= c
i
| θ ) P ( x | y
j
= c
i
; θ ) (9)
3.
Description of our approach
In this section, we describe our approach shown in Algorithm 1
as two steps: Firstly, the density-based parameters including least
data points MinPts and a radius Eps are determined by use of both
labeled and unlabeled data for each of classes. Next, the local clus-
ters are constructed by applying DBSCAN on the target dataset
with different parameter sets, and then the final clustering result
is obtained by integrating these local clusters.
3.1. Determination of density-based parameter sets
For density-based clustering approaches, different set of param-
eters (least data points MinPts and a radius Eps ) can result in var-
ious clustering results. We assume that the each class of instances
are constructed in different size, shape and density, thus it is nec-
essary to define individual set of parameters for each cluster.