2011年企业计算系统论文：分区聚类算法综述

需积分: 10 179 浏览量更新于2024-09-27 收藏 153KB PDF 举报

本文档是一篇发表在《国际企业计算与业务系统》(International Journal of Enterprise Computing and Business Systems)上的调查论文，标题为"Partition Clustering Algorithms的调查"。文章由S.Anitha Elavarasi、Dr.J.Akilandeswari和Dr.B.Sathiyabhama三位学者共同撰写，他们在Sona College of Technology位于印度Salem的计算机科学与工程部门任职。论文主要探讨了在大数据时代背景下，聚类分析作为一种无监督学习方法的重要性和应用。聚类是数据挖掘领域中的关键任务，它涉及将对象或数据点根据它们的相似性自动组织成若干个类别，每个类别内部的成员具有高度相似特征，而不同类别间的相似性则相对较低。这种技术在众多领域中都有广泛应用，如市场细分、社交网络分析、生物信息学、图像处理等，帮助人们发现数据集中的潜在结构和模式。在这篇论文中，作者首先对聚类算法进行了概述，强调了两种主要的聚类类型：层次聚类（Hierarchical Clustering）和划分聚类（Partition Clustering）。划分聚类，也称为非层次聚类，包括诸如K-means、K-medoids、谱聚类（Spectral Clustering）以及DBSCAN等方法。这些算法各有特点，例如K-means通过迭代优化将数据分为预定数量的簇，而DBSCAN则能发现任意形状的簇，不需要预先指定簇的数量。作者深入研究了各类分区聚类算法的工作原理、优缺点及适用场景。他们可能讨论了如何选择合适的评价指标（如轮廓系数、Calinski-Harabasz指数等）来评估聚类效果，以及如何处理大数据集时面临的挑战，如计算效率、内存消耗和高维数据的问题。此外，论文还可能探讨了算法的改进方法，比如通过集成学习或引入新的启发式策略来提高聚类性能。最后，论文可能提供了当前研究的最新进展和未来的研究方向，尤其是在大数据、云计算和人工智能等技术快速发展下，如何更好地利用分区聚类算法以满足不断增长的数据分析需求。这篇论文是对聚类算法特别是分区聚类算法的系统综述，旨在为研究人员和实践者提供一个全面的理解框架，以便于他们在实际问题中更有效地应用这些技术。对于任何从事数据分析、机器学习或数据挖掘领域的读者来说，这是一份宝贵的信息资源。

International Journal of Enterprise Computing and Business Systems

International Journal of Enterprise Computing and Business Systems International Journal of Enterprise Computing and Business Systems

International Journal of Enterprise Computing and Business Systems

(Online)

(Online)(Online)

(Online)

http://www.ijecbs.com

Vol. 1 Issue 1 January 2011

2.1 Hierarchical Clustering Algorithm

Hierarchical clustering algorithm groups data objects to form a tree shaped structure. It can be broadly

classified into agglomerative hierarchical clustering and divisive hierarchical clustering. In agglomerative

approach which is also called as bottom up approach, each data points are considered to be a separate

cluster and on each iteration clusters are merged based on a criteria. The merging can be done by using

single link, complete link, centroid or wards method. In divisive approach all data points are considered

as a single cluster and they are splited into number of clusters based on certain criteria, and this is called

as top down approach. Examples for this algorithms are LEGCLUST [23], BRICH [20] (Balance Iterative

Reducing and Clustering using Hierarchies), CURE (Cluster Using REpresentatives) [21], and Chemeleon

[1].

2.2 Spectral Clustering Algorithm

Spectral clustering refers to a class of techniques which relies on the Eigen structure of a similarity matrix.

Clusters are formed by partition data points using the similarity matrix. Any spectral clustering algorithm

will have three main stages [24]. They are

1. Preprocessing: Deals with the construction of similarity matrix.

2. Spectral Mapping: Deals with the construction of eigen vectors for the similarity matrix

3. Post Processing: Deals with the grouping data points

The following are advantages of Spectral clustering algorithm:

1. Strong assumptions on cluster shape are not made.

2. Simple to implement.

3. Objective does not consider local optima.

4. Statistically consistent.

5. Works faster.

The major drawback of this approach is that it exhibits high computational complexity. For the larger

dataset it requires O(n

) where n is the number of data points [17]. Examples for this algorithms are SM

(Shi and Malik) algorithm, KVV (Kannan,Vempala andVetta) algorithm, NJW ( Ng, Jordan and Weiss)

algorithm [23].

剩余13页未读，继续阅读

cuizaixu_jingzhe

粉丝: 6
资源: 51

2011年企业计算系统论文：分区聚类算法综述

Rockchip_Introduction_Partition_CN.pdf

Active@ Partition Recovery v.5.0.357 Enterprise Toolkit. Retail

论文研究-基于粒子群优化的模糊文本聚类研究 .pdf

AttributeError: module 'networkx.algorithms.community' has no attribute 'best_partition'. Did you mean: 'is_partition'?

Partition-Guided GANs.pdf

NIUBI.Partition.Editor.Technician.Edition.7.2.1

聚类算法研究.pdf

MiniTool.Partition.Wizard.Technician.11.0.1带汉化

Perspectives on the CAP Theorem.pdf

Oracle查询中OVER (PARTITION BY ..)用法

最新资源