视觉评估集群结构：基于精细关联矩阵的层次聚类分析

需积分: 9 17 浏览量更新于2024-08-26 收藏 1.36MB PDF 举报

"这篇研究论文探讨了视觉层次集群结构的评估方法，主要关注基于精细关联矩阵的集群趋势视觉评估。文章作者包括Caiming Zhang、Xiaodong Yue和Jingsheng Lei，分别来自宁波大学、上海电力大学和上海大学的计算机科学和技术学院。文章在Pattern Recognition Letters期刊上发表，探讨了层次聚类算法的局限性，特别是单链级聚类算法，并提出了一种改进的视觉评估方法来揭示数据集的集群结构。关键词包括层次聚类、集群趋势的视觉评估、关联矩阵和集成。" 文章详细内容：这篇研究论文聚焦于在数据挖掘领域中的层次聚类分析，特别是如何通过视觉方式有效地评估聚类的质量和结构。传统的层次聚类算法，如单链级聚类，虽然可以展示数据的层级关系，但其聚类结果的质量很大程度上依赖于所选择的相似性度量。因此，作者提出了一种新的方法，即基于精细关联矩阵的视觉评估（Visual Assessment of Cluster Tendency, VAT）。 VAT是一种非监督学习技术，它通过对相似性矩阵进行重新排序，以直观地揭示数据集中的潜在集群结构。这种方法能够帮助用户理解和验证聚类结果，尤其是在没有先验知识的情况下。然而，原始的VAT方法可能受到噪声和异常值的影响，导致对集群结构的判断不准确。为了改善这一情况，论文提出了一种精炼的关联矩阵（refined co-association matrix）。这种矩阵能够更准确地捕捉数据点之间的关联性，同时减少了噪声的干扰。通过使用这种矩阵，研究人员可以更好地识别出数据集中的紧密群体，以及它们之间的层次关系。此外，论文还讨论了集成方法在聚类评估中的应用。通过结合多个聚类结果，可以提高评估的稳定性和可靠性。这种方法增强了对集群结构的视觉评估，使得分析人员能够更全面地理解数据的内在模式。总结来说，这篇研究论文提供了一种创新的视觉工具，用于评估和理解复杂数据集的层次聚类结构。基于精细关联矩阵的VAT方法不仅提高了聚类质量的评估精度，也为数据科学家和研究人员提供了更强大的工具，帮助他们在没有明确指导的情况下探索和解释数据的隐藏模式。

Pattern Recognition Letters 59 (2015) 48–55

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier.com/locate/patrec

Visual hierarchical cluster structure: A reﬁned co-association matrix

based visual assessment of cluster tendency

✩

Caiming Zhong

a,∗

, Xiaodong Yue

, Jingsheng Lei

College of Science and Technology, Ningbo University, 315211 Ningbo, PR China

School of Computer Science and Technology, Shanghai University of Electric Power, 200090 Shanghai, China

Department of Computer Science and Technology, Shanghai University, 200444 Shanghai, China

article info

Article history:

Received 14 July 2014

Available online 20 March 2015

Keywords:

Hierarchical clustering

Visual assessment of cluster tendency

Co-association matrix

Ensemble

abstract

A hierarchical clustering algorithm, such as Single-linkage, can depict the hierarchical relationship of clus-

ters, but its clustering quality mainly depends on the similarity measure used. Visual assessment of cluster

tendency (VAT) reorders a similarity matrix to reveal the cluster structure of a data set, and a VAT-based

clustering discovers clusters by image segmentation techniques. Although VAT can visually present the clus-

ter structure, its performance also relies on the similarity matrix employed. In this paper, we take a reﬁned

co-association matrix, which is originally used in ensemble clustering, as an initial similarity matrix and

transform it by path-based measure, and then apply it to VAT. The ﬁnal clustering is achieved by directly

analyzing the transformed and reordered similarity matrix. The proposed method can deal with data sets

with some complex cluster structures and reveal the relationship of clusters hierarchically. The experimental

results on synthetic and real data sets demonstrate the above mentioned properties.

1. Introduction

Hierarchical clustering is a type of typical method in cluster anal-

ysis. A hierarchical clustering algorithm not only detects the cluster

structure of a data set but also reveals the hierarchical relationship of

the clusters. Hierarchical clustering can be further grouped into two

categories: Agglomerative and divisive. The former takes each data

point as a cluster initially, and iteratively combines the most similar

cluster pair until the pre-speciﬁed number of clusters are obtained,

while the later takes the whole data as a cluster and repeatedly di-

vides a selected cluster into two clusters. Single-linkage [7] is a well

known hierarchical clustering algorithm and produces a dendrogram

of the clusters. Since it only focuses on the connectedness of the data

set, Single-linkage is sensitive to noise data.

For a given data set X =

,...,x

}, Visual assessment of cluster

tendency (VAT) [2] reorders the pairwise dissimilarity matrix D of X

so that the cluster structure information can be presented by a digital

image I

∗

)with N × N pixels, where D

∗

is the reordered version of D.

VAT-based clustering algorithms in the literature [3,8–12,20,23,24]

usually partition the data set by segmenting I

∗

). Huband et al. [12]

presented a bigVAT that can handle large data sets or relational data

✩

This paper has been recommended for acceptance by Andrea Torsello.

∗

Corresponding author: Tel.: +86 138 198 78682; fax: +86 574 876 00842.

E-mail address: zhongcaiming@nbu.edu.cn, charman@163.com (C. Zhong).

sets. Hathaway et al. [8] proposed a scalable and sample-based version

of VAT, which can also deal with large data sets. Bezdek et al. [3] ex-

tended VAT to a rectangular dissimilarity matrix so that it can analyze

relational data which are generally presented by an m × n dissimilar-

ity matrix. Havens et al. [11] deﬁned an objective function to detect

the boundaries that are related to the cluster structure in image I

∗

and employed particle swarm optimization technique to optimize the

objective function. Although this method can automatically detect the

number of clusters, the optimization process is computationally in-

tensive. Sledge et al. [20] proposed a method to ﬁnd the number of

clusters by constructing a correlation matrix to ﬁlter D

∗

. Compared

with other VAT-based methods, this algorithm directly deals with D

∗

rather than I(D

∗

). The algorithm of [23] is based on image segmenta-

tion techniques. To handle complex data sets, Wang et al. transformed

the dissimilarity matrix with some manifold learning methods. This

is an effective way to improve the quality of VAT-based clustering

results, even if the transformation is not related to VAT technique

itself. Wang et al. [24] presented an improved VAT (iVAT), in which

a path-based distance measure is used to transform D so that the

corresponding I

∗

) can describe clearly the hierarchical structure of

clusters. Havens and Bezdek[9] proposed an eﬃcient iVAT algorithm,

of which the computational complexity is O

). Havens and Bezdek

[10] proposed a new formulation of matrix reordering algorithm (co-

VAT) to handle all cluster problems of rectangular relational data.

In general, dissimilarity or similarity measure is one of the most

important components in a clustering algorithm. VAT-based cluster-

http://dx.doi.org/10.1016/j.patrec.2015.03.007

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38675465

粉丝: 6

视觉评估集群结构：基于精细关联矩阵的层次聚类分析

最新资源