Barnes-Hut算法优化数据字段层次聚类

141 浏览量更新于2024-08-26 收藏 1.33MB PDF 举报

"使用Barnes-Hut算法改善数据字段分层聚类" 在"Pattern Recognition Letters"期刊的一篇研究论文中，作者 Zhongliu Zhuo、Xiaosong Zhang、Weina Niu、Guowu Yang 和 Jingzhong Zhang 来自中国电子科技大学计算机科学与工程学院，探讨了如何通过Barnes-Hut算法来优化数据字段的层次聚类。该研究关注于解决传统数据字段层次聚类算法（DFHCA）计算效率低下的问题。传统的DFHCA方法采用暴力计算法来计算每个对象之间的力，这导致了计算复杂度随着数据量n的平方增加，即O(n^2)。这种高复杂度限制了算法在大规模数据集上的应用。为了改进这一情况，研究人员引入了Barnes-Hut算法，这是一种用于模拟物理系统中多体问题的高效算法，尤其适用于减少远程粒子间的相互作用计算。 Barnes-Hut算法的核心是通过构建一棵八叉树（或称为 octree）来组织数据。这棵树将空间划分为多个子区域，并在树的节点上存储子区域内的粒子信息。在计算力时，如果一个粒子远离另一个粒子，那么可以使用其质心（中心质量）近似代替粒子群，大大减少了需要计算的力的数量，从而降低了计算复杂度到O(n log n)。论文中，作者比较了改进后的算法与传统方法，结果显示，他们的方法不仅在计算效率上有所提升，而且不需要进行任何额外的参数调整。这意味着新算法在保持聚类效果的同时，能更好地适应各种规模的数据集，尤其对于大数据集的处理，优势更为明显。此外，Barnes-Hut算法的引入还可能有助于解决层次聚类中的并行化问题，因为它允许在不同树节点之间并行计算。这进一步提高了算法在现代多核处理器或分布式计算环境中的性能。总结来说，这篇研究论文提出了一种利用Barnes-Hut算法改进数据字段层次聚类的方法，通过减少计算量和提高计算效率，为大规模数据集的层次聚类提供了一种实用且高效的解决方案。这种方法对于数据挖掘、机器学习以及需要处理大量数据的其他领域具有重要的实际应用价值。

Pattern Recognition Letters 80 (2016) 113–120

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier.com/locate/patrec

Improving data ﬁeld hierarchical clustering using Barnes–Hut

algorithm

Zhongliu Zhuo

∗

, Xiaosong Zhang , Weina Niu , Guowu Yang , Jingzhong Zhang

University of Electronic Science and Technology of China, School of Computer Science and Engineering, No. 2006, Xiyuan Ave, West Hi-Tech Zone,

Chengdu 61117 31, PR China

a r t i c l e i n f o

Article history:

Received 2 November 2015

Available online 24 June 2016

Keywords:

Barnes–Hut algorithm

Data ﬁeld

Hierarchical clustering

Computation eﬃciency

a b s t r a c t

Traditional Data Field Hierarchical Clustering Algorithm (DFHCA) uses brute force method to compute the

forces exert on each object. The computation complexity increases as O ( n

). In this study, we improve the

force computation eﬃciency of DFHCA to O ( n log n ). We use the Barnes–Hut tree to reduce the number

of force computation by approximating far away particles with their center of mass. And compared with

traditional method, our method does not need to tune the parameters. In our implementation, we discuss

two different merging strategies. Experimental results show that the proposed method could improve the

computation eﬃciency under the same settings. We also ﬁnd that DFHCA-M merging strategy converges

faster than DFHCA-S merging strategy. Finally, we compare and analyze the time complexity and space

complexity of our algorithm.

1. Introduction

Clustering plays a signiﬁcant role in data analysis, and can be

applied to many ﬁelds, including machine learning, image analy-

sis and bioinformatics. Previous works [5,7,17] have designed and

implemented various new clustering algorithms, and considerable

effort s have been put into improving the performance of existing

ones.

One type of clustering algorithm which generates a hierarchical

dendrogram as return is classiﬁed as hierarchical clustering algo-

rithm. Hierarchical dendrogram is desirable in many applications,

due to the need to construct taxonomies [6] . But the problem with

hierarchical clustering algorithm is parameter tuning. That is to say

the clustering result highly depends on how certain parameters are

set. Because the optimal parameter setting is data dependent, tun-

ing parameter would be unfriendly for people to use.

The other kind of clustering algorithm uses ﬁeld theory from

physics, i.e. nuclear ﬁeld (Data ﬁeld clustering algorithm, DFCA)

and gravitational ﬁeld (Gravitational clustering algorithm, GCA).

Unlike other clustering algorithms which require the number of

clusters to be speciﬁed. GCA and DFCA can automatically deter-

mine the number of clusters, also parameter tuning is essentially

not required. In addition, GCA and DFCA do not have a rigid “simi-

This paper has been recommended for acceptance by Dr. S. Wang.

∗

Corresponding author. Tel.: +86 15928426678.

E-mail addresses: zhuozhongliu@126.com , johnsonzxs@uestc.edu.cn (Z. Zhuo),

johnsonzxs@uestc.edu.cn (X. Zhang), guowu@uestc.edu.cn (G. Yang).

larity” measure. However, GCA and DFCA are not able to show den-

drogram. To solve this problem, the data ﬁled hierarchical cluster-

ing algorithm (DFHCA) was proposed by Wang et al. [15] . It has ad-

vantages both from traditional agglomerate hierarchical clustering

algorithm (Agglom. Hiera) and from traditional gravitational clus-

tering algorithm (GCA). But all the above algorithms have a high

force computation complexity increases as O ( n

Inspired by cosmological simulations and N-body questions in

the past research [1,12] . In this study, we propose the Barnes–Hut

based Data Field Hierarchical Clustering Algorithm Multiple ver-

sion, and we name it BH-DFHCA-M. Our method (BH-DFHCA-M)

improves DFHCA in terms of the force calculation time for cluster-

ing on the whole data set.

The highlights of this paper are threefold:

1. We propose Barnes–Hut based data ﬁeld hierarchical clustering

algorithm.

2. Our algorithm does not need to tune the parameters.

3. We improve the eﬃciency of traditional data ﬁeld hierarchical

clustering algorithm averagely 9.5% and 13.4% in 2D and 3D sce-

narios, respectively.

The rest of the paper is organized as follows, Section 2 dis-

cusses the related works. Section 3 describes the Barnes–Hut al-

gorithm in detail. Section 4 ﬁrst introduces data ﬁeld clustering

and then we illustrate our algorithm. Section 5 analyzes the exper-

imental results, and then we compare the computation and space

complexity with other algorithms. Finally, conclusions are drawn

and future works are highlighted in Section 6 .

http://dx.doi.org/10.1016/j.patrec.2016.06.008

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38642864

粉丝: 2
资源: 899

Barnes-Hut算法优化数据字段层次聚类

parallel-barnes-hut：Barnes-Hut算法的并行高效C ++实现，用于模拟N体系统

基于Barnes+Hut算法的N-body问题模拟

barnes-hut-rs:使用WASM实现可视化和Web部署中的Barnes Hut生锈算法

Barnes-Hut-Simulator：使用Barnes-Hut-Algorithm有效解决N体问题

使用 Barnes-Hut算法在C中进行 重力模拟_C语言_代码_下载

非易失性存储器上的容错Barnes-Hut算法

使用Barnes-Hut算法模拟银河系N-body问题

MATLAB实现Barnes-Hut算法的N体模拟

Barnes-Hut算法在N-body问题模拟中的应用

非易失性存储器上的容错Barnes-Hut算法研究

最新资源

使用 Barnes-Hut算法在C中进行重力模拟_C语言_代码_下载