OPE-HCA：层次聚类的最优概率估计方法

62 浏览量更新于2024-08-26 收藏 1.65MB PDF 举报

"OPE-HCA：一种用于层次聚类算法的最佳概率估计方法" 在数据挖掘领域，层次聚类算法（Hierarchical Clustering Algorithm，HCA）是一种常见的无监督学习技术，用于发现数据集中的内在结构和模式。HCA通过构建一棵层次树（Dendrogram）来展示数据间的相似性或相异性，这棵树可以被切割成不同数量的群组，即聚类。通常，HCA分为两种基本策略：凝聚（Agglomerative）和分裂（Divisive）。凝聚HCA自下而上地合并最相似的个体，而分裂HCA则自上而下地分割群体。然而，传统的层次聚类方法面临两个主要挑战：一是依赖于特定的距离度量，这可能导致对非欧几里得空间中的数据处理不当；二是集群整合的复杂性，确定何时合并群组以及如何准确评估合并的合理性是困难的。针对这些问题，研究者们提出了OPE-HCA（Optimal Probabilistic Estimation for Hierarchical Clustering Algorithm）方法，它引入了“适度生存原则”，这是一种基于概率的最佳估计策略。 OPE-HCA的核心思想是利用概率模型来估计数据点之间的相似性和聚类的概率。这种方法旨在克服基于距离的局限性，提供更加灵活和鲁棒的聚类结果。通过结合优化技术，OPE-HCA可以在聚类过程中动态调整聚类中心，从而提高聚类的质量和稳定性。实验结果显示，OPE-HCA在 Normalized Mutual Information (NMI) 和聚类精度等评价指标上表现出色，能够在不同描述级别上有效地搜索和识别模式，且优于其他多种聚类算法。此外，文章还强调了版权保护和出版物的使用规定，指出如果需要自我归档，应遵循特定的时间限制和引用指南，确保对原始出版物的尊重和链接。 OPE-HCA是层次聚类算法的一种创新改进，它通过概率估计和优化技术提升了聚类的效率和准确性，对于处理复杂和多样化数据集提供了新的解决方案。这一方法不仅对数据挖掘和机器学习领域有重要贡献，也为相关领域的研究和应用提供了新的思路。

THEORY AND APPLICATIONS OF SOFT COMPUTING METHODS

OPE-HCA: an optimal probabilistic estimation approach

for hierarchical clustering algorithm

Jiancong Fan

1,2,3

Received: 26 February 2015 / Accepted: 21 July 2015

 The Natural Computing Applications Forum 2015

Abstract The Survival of the Fittest is a principle which

selects the superior and eliminates the inferior in the nat-

ure. This principle has been used in many ﬁelds, especially

in optimization problem-s olving. Clustering in data mining

community endeavors to discover unknown representations

or patterns hidden in datasets. Hierarchical clustering

algorithm (HCA) is a method of cluster analysis which

searches the optimal distribution of clusters by a hierar-

chical structure. Strategies for hierarchical clustering gen-

erally have two types: agglomerative with a bottom-up

procedure and divisive with a top-down procedure. How-

ever, most of the clustering approaches have two disad-

vantages: the use of distance-based measurement and the

difﬁculty of the clusters integration. In this paper, we

propose an optimal probabilistic estimation (OPE)

approach by exploiting the Survival of the Fittest principle.

We devise a hierarchical clustering algorithm (HCA) based

on OPE, also called OPE-HCA. The OPE-HCA combines

optimization with probability and aggl omerative HCA.

Experimental results show that the OPE-HCA has the

ability of searching and discovering patterns at different

description levels and can also obtain better performance

than many clustering algorithms according to NMI and

clustering accuracy measures.

Keywords Clustering  Hierarchical clustering

algorithm  Data mining  Probabilistic estimation

1 Introduction

The phrase ‘‘Survival of the Fittest’’ is originated from

evolutionary theory as a way of describing the natural

selection mechanism. Generally, it refers that the proba-

bility of survivors is high if the survivors are ﬁt for the

natural environment. So it is more commonly used today to

refer to a supposed greater probability that ‘‘ﬁt’’ as opposed

to ‘‘unﬁt’’ individuals will survive some context.

Clustering is a general task to be solved in data analysis

and mining. The clusters obtained by various clustering

algorithms differ signiﬁca ntly in their tasks and objectives.

So far, however , it is still a hard work to predict what

constitutes a cluster hidden in dataset and how to efﬁciently

discover them with high accuracy. One of the main reasons

is that clustering is an unsupervised analysis process. It is

unknown how many clusters there exist and what the

names of clusters are. But this problem occurs in most

practical applications, such as web data mining and big

data analysis, because it is difﬁcult to foresee exactly the

hidden patterns in black box or the event that has not

occurred. There have emerged rich clustering strategies and

algorithms attempting to solve the blindness in clustering

process. However, most of them are specialized algorithms

that one algorithm is only suitable to solve one particular

dataset.

Among the numerous clustering methods, hierarchical

clustering [1, 2], also called connectivity-based clustering,

& Jiancong Fan

fanjiancong@sdust.edu.cn

State Key Laboratory of Mining Disaster Prevention and

Control Co-founded by Shandong Province and the Ministry

of Science and Technology, Shandong University of Science

and Technology, Qingdao 266590, China

College of Information Science and Engineering, Shandong

University of Science and Technology, Qingdao 266590,

China

State Key Laboratory for Novel Software Technology,

Nanjing University, Nanjing 210023, China

123

Neural Comput & Applic

DOI 10.1007/s00521-015-1998-5

Author's personal copy

剩余12页未读，继续阅读

weixin_38663036

粉丝: 4
资源: 928

OPE-HCA：层次聚类的最优概率估计方法

维度概率摘要模型及其层次聚类算法

HCA需求

分层聚类分析 (HCA)：计算欧几里得距离并通过平均值进行聚类。-matlab开发

ope-project：OPE-开发源代码

OPE-tools:基于离岸政策评估报告实证研究的OPE工具

OPE-Cristal:Cristal水晶商店，Impacta博物馆

OPE-tools: Python实现的离岸政策评估工具

Projeto_OPE-site-：站点反向转换

react-native-app-auth:用于AppAuth的React本机网桥-一种用于与OAuth2提供程序进行通信的SDK

OpenSZZ-Cloud-Native:SZZ算法检测故障导致的提交

最新资源