Pattern Recognition Letters 80 (2016) 113–120
Contents lists available at ScienceDirect
Pattern Recognition Letters
journal homepage: www.elsevier.com/locate/patrec
Improving data field hierarchical clustering using Barnes–Hut
algorithm
R
Zhongliu Zhuo
∗
, Xiaosong Zhang , Weina Niu , Guowu Yang , Jingzhong Zhang
University of Electronic Science and Technology of China, School of Computer Science and Engineering, No. 2006, Xiyuan Ave, West Hi-Tech Zone,
Chengdu 61117 31, PR China
a r t i c l e i n f o
Article history:
Received 2 November 2015
Available online 24 June 2016
Keywords:
Barnes–Hut algorithm
Data field
Hierarchical clustering
Computation efficiency
a b s t r a c t
Traditional Data Field Hierarchical Clustering Algorithm (DFHCA) uses brute force method to compute the
forces exert on each object. The computation complexity increases as O ( n
2
). In this study, we improve the
force computation efficiency of DFHCA to O ( n log n ). We use the Barnes–Hut tree to reduce the number
of force computation by approximating far away particles with their center of mass. And compared with
traditional method, our method does not need to tune the parameters. In our implementation, we discuss
two different merging strategies. Experimental results show that the proposed method could improve the
computation efficiency under the same settings. We also find that DFHCA-M merging strategy converges
faster than DFHCA-S merging strategy. Finally, we compare and analyze the time complexity and space
complexity of our algorithm.
©2016 Elsevier B.V. All rights reserved.
1. Introduction
Clustering plays a significant role in data analysis, and can be
applied to many fields, including machine learning, image analy-
sis and bioinformatics. Previous works [5,7,17] have designed and
implemented various new clustering algorithms, and considerable
effort s have been put into improving the performance of existing
ones.
One type of clustering algorithm which generates a hierarchical
dendrogram as return is classified as hierarchical clustering algo-
rithm. Hierarchical dendrogram is desirable in many applications,
due to the need to construct taxonomies [6] . But the problem with
hierarchical clustering algorithm is parameter tuning. That is to say
the clustering result highly depends on how certain parameters are
set. Because the optimal parameter setting is data dependent, tun-
ing parameter would be unfriendly for people to use.
The other kind of clustering algorithm uses field theory from
physics, i.e. nuclear field (Data field clustering algorithm, DFCA)
and gravitational field (Gravitational clustering algorithm, GCA).
Unlike other clustering algorithms which require the number of
clusters to be specified. GCA and DFCA can automatically deter-
mine the number of clusters, also parameter tuning is essentially
not required. In addition, GCA and DFCA do not have a rigid “simi-
R
This paper has been recommended for acceptance by Dr. S. Wang.
∗
Corresponding author. Tel.: +86 15928426678.
E-mail addresses: zhuozhongliu@126.com , johnsonzxs@uestc.edu.cn (Z. Zhuo),
johnsonzxs@uestc.edu.cn (X. Zhang), guowu@uestc.edu.cn (G. Yang).
larity” measure. However, GCA and DFCA are not able to show den-
drogram. To solve this problem, the data filed hierarchical cluster-
ing algorithm (DFHCA) was proposed by Wang et al. [15] . It has ad-
vantages both from traditional agglomerate hierarchical clustering
algorithm (Agglom. Hiera) and from traditional gravitational clus-
tering algorithm (GCA). But all the above algorithms have a high
force computation complexity increases as O ( n
2
).
Inspired by cosmological simulations and N-body questions in
the past research [1,12] . In this study, we propose the Barnes–Hut
based Data Field Hierarchical Clustering Algorithm Multiple ver-
sion, and we name it BH-DFHCA-M. Our method (BH-DFHCA-M)
improves DFHCA in terms of the force calculation time for cluster-
ing on the whole data set.
The highlights of this paper are threefold:
1. We propose Barnes–Hut based data field hierarchical clustering
algorithm.
2. Our algorithm does not need to tune the parameters.
3. We improve the efficiency of traditional data field hierarchical
clustering algorithm averagely 9.5% and 13.4% in 2D and 3D sce-
narios, respectively.
The rest of the paper is organized as follows, Section 2 dis-
cusses the related works. Section 3 describes the Barnes–Hut al-
gorithm in detail. Section 4 first introduces data field clustering
and then we illustrate our algorithm. Section 5 analyzes the exper-
imental results, and then we compare the computation and space
complexity with other algorithms. Finally, conclusions are drawn
and future works are highlighted in Section 6 .
http://dx.doi.org/10.1016/j.patrec.2016.06.008
0167-8655/© 2016 Elsevier B.V. All rights reserved.