Uncertain Information Clustering Based on Distance Between BPAs
Ya Li
1
,Yajuan Zhang
1
, Daijun Wei
1,2
, Yong Deng
1
1. School of Computer and Information Science, Southwest University, Chongqing, 400715,China
E-mail: ydeng@swu.edu.cn
2. School of Science, Hubei Institute for Nationalities, Enshi, 445000, China
Abstract: It is necessary to cluster the information according to their sources when analyzing multi-source information.
In this paper, a new evidential clustering method is proposed. In the proposed method, pairwise distance between BPAs
have been introduced to form a matrix for clustering. The clustering method is based on vector which is transformed
from distance matrix. Illustrative example with several sets demonstrate the validity of the proposed method as compared
to other methods.
Key Words: Clustering, Dempster-Shafer theory, Distance between BPAs
1 INTRODUCTION
In recent years, a great deal of attention has been paid to
the analysis of imprecise or fuzzy data. Several references
may be found in the literature focusing on inferential statis-
tics, regression or classification. Along with variety of of
sensors are been used to detect object, a great deal of infor-
mation has been got. when analyzing these kind of multi-
source information, it is necessary to cluster the informa-
tion according to their source. However, a wide variety of
information expression form makes information analyzing
more difficult. Under such circumstance, all the informa-
tion can be unified into one form, namely, evidence, for
later information fusion using.
In [1], a problem of clustering multi-source information de-
noted by evidence is investigated, and an evidence cluster-
ing standard is given. In addition, an idea of transforma-
tion from the evidence interspaces to Euclidean interspace
is presented, then the HCM clustering algorithm is used to
cluster the multi-source information. We consider that the
transformation in [1] itself is not so reasonable, the reasons
are presented in the following context.
The method presented in this paper uses a different ap-
proach. Since there is no bijection between BPAs to pignis-
tic probabilities, the method in [1] might not make use of
all the information of the BPAs. It appears useful to apply
pairwise distance between BPAs for clustering rather than
transform evidence interspaces to Euclidean interspace.
The work is partially supported National Natural Science Foundation
of China, Grant No.60874105, 61174022, Program for New Century Ex-
cellent Talents in University, Grant No.NCET-08-0345, Chongqing Natu-
ral Science Foundation, Grant No. CSCT, 2010BA2003, the Fundamental
Research Funds for the Central Universities Grant. No XDJK2010C030,
Grant. No XDJK2011D002, Doctor Funding of Southwest University
Grant No. SWU110021. The first author also greatly appreciates the sup-
port by the School of Computer and Information Sciences of Southwest
University Scientific and Technological Innovation Fund for Students.
*Corresponding author: Yong Deng, School of Computer and In-
formation Sciences, Southwest University, Chongqing, 400715, E-mail:
ydeng@swu.edu.cn.
Previous related work addressing distances between BPAs
deserves to be mentioned here. Zouhal and Denoeux
[2] introduced a distance based on the mean square er-
ror between pignistic probabilities to improve a classifi-
cation algorithm based on the k-nearest neighbor rule and
Dempster-Shafer’s theory. Jousselme and Grenier [3] in-
troduced a principled distance between two basic probabil-
ity assignments(BPAs)(or two bodies of evidence) based on
quantification of the similarity between sets. They gave a
geometrical interpretation of BPAs and shown that the pro-
posed distance satisfied all the requirements for a metric.
The distance function in this paper is adapted from [3].
The proposed method deal with data in a very natural way
and to gain a full use of all the information of BPAs, as will
be shown by experimental results. The rest of the paper is
organized as follows.
First, the necessary background of Dempster-Shafer theory
is recalled in Section 2. Section 3 shows what the condi-
tions of metric spaces should satisfy. And then, distance
between two BPAs is presented. Hierarchical clustering is
simply introduced in section 4. In addition, illustrative ex-
ample, with synthetic data sets, are described in this Sec-
tion. Section 5 concludes this paper.
2 Dempster-Shafer theory
The Dempster-Shafer theory, first proposed by Dempster
[4] and then developed by Shafer [5], is often regarded
as an extension of the bayesian theory of probability. As
a theory of reasoning under the uncertain environment,
Dempster-Shafer theory has an advantage of assigning the
probability to the subsets of the set composed of 𝑁 objects,
rather than to each of the individual objects. The probabil-
ity assigned to each subset is limited by a lower bound and
an upper bound, which respectively measure the total be-
lief and the total plausibility for the objects in the subset.
Furthermore, the Dempster-Shafer theory has the ability of
combining pairs of bodies of evidence or belief functions to
derive a new evidence or belief function. At present, some
3985
978-1-4577-2074-1/12/$26.00
c
2012 IEEE