Pattern Recognition Letters 59 (2015) 48–55
Contents lists available at ScienceDirect
Pattern Recognition Letters
journal homepage: www.elsevier.com/locate/patrec
Visual hierarchical cluster structure: A refined co-association matrix
based visual assessment of cluster tendency
✩
Caiming Zhong
a,∗
, Xiaodong Yue
b
, Jingsheng Lei
c
a
College of Science and Technology, Ningbo University, 315211 Ningbo, PR China
b
School of Computer Science and Technology, Shanghai University of Electric Power, 200090 Shanghai, China
c
Department of Computer Science and Technology, Shanghai University, 200444 Shanghai, China
article info
Article history:
Received 14 July 2014
Available online 20 March 2015
Keywords:
Hierarchical clustering
Visual assessment of cluster tendency
Co-association matrix
Ensemble
abstract
A hierarchical clustering algorithm, such as Single-linkage, can depict the hierarchical relationship of clus-
ters, but its clustering quality mainly depends on the similarity measure used. Visual assessment of cluster
tendency (VAT) reorders a similarity matrix to reveal the cluster structure of a data set, and a VAT-based
clustering discovers clusters by image segmentation techniques. Although VAT can visually present the clus-
ter structure, its performance also relies on the similarity matrix employed. In this paper, we take a refined
co-association matrix, which is originally used in ensemble clustering, as an initial similarity matrix and
transform it by path-based measure, and then apply it to VAT. The final clustering is achieved by directly
analyzing the transformed and reordered similarity matrix. The proposed method can deal with data sets
with some complex cluster structures and reveal the relationship of clusters hierarchically. The experimental
results on synthetic and real data sets demonstrate the above mentioned properties.
© 2015 Elsevier B.V. All rights reserved.
1. Introduction
Hierarchical clustering is a type of typical method in cluster anal-
ysis. A hierarchical clustering algorithm not only detects the cluster
structure of a data set but also reveals the hierarchical relationship of
the clusters. Hierarchical clustering can be further grouped into two
categories: Agglomerative and divisive. The former takes each data
point as a cluster initially, and iteratively combines the most similar
cluster pair until the pre-specified number of clusters are obtained,
while the later takes the whole data as a cluster and repeatedly di-
vides a selected cluster into two clusters. Single-linkage [7] is a well
known hierarchical clustering algorithm and produces a dendrogram
of the clusters. Since it only focuses on the connectedness of the data
set, Single-linkage is sensitive to noise data.
For a given data set X =
{x
1
,...,x
N
}, Visual assessment of cluster
tendency (VAT) [2] reorders the pairwise dissimilarity matrix D of X
so that the cluster structure information can be presented by a digital
image I
(D
∗
)with N × N pixels, where D
∗
is the reordered version of D.
VAT-based clustering algorithms in the literature [3,8–12,20,23,24]
usually partition the data set by segmenting I
(D
∗
). Huband et al. [12]
presented a bigVAT that can handle large data sets or relational data
✩
This paper has been recommended for acceptance by Andrea Torsello.
∗
Corresponding author: Tel.: +86 138 198 78682; fax: +86 574 876 00842.
E-mail address: zhongcaiming@nbu.edu.cn, charman@163.com (C. Zhong).
sets. Hathaway et al. [8] proposed a scalable and sample-based version
of VAT, which can also deal with large data sets. Bezdek et al. [3] ex-
tended VAT to a rectangular dissimilarity matrix so that it can analyze
relational data which are generally presented by an m × n dissimilar-
ity matrix. Havens et al. [11] defined an objective function to detect
the boundaries that are related to the cluster structure in image I
(D
∗
),
and employed particle swarm optimization technique to optimize the
objective function. Although this method can automatically detect the
number of clusters, the optimization process is computationally in-
tensive. Sledge et al. [20] proposed a method to find the number of
clusters by constructing a correlation matrix to filter D
∗
. Compared
with other VAT-based methods, this algorithm directly deals with D
∗
rather than I(D
∗
). The algorithm of [23] is based on image segmenta-
tion techniques. To handle complex data sets, Wang et al. transformed
the dissimilarity matrix with some manifold learning methods. This
is an effective way to improve the quality of VAT-based clustering
results, even if the transformation is not related to VAT technique
itself. Wang et al. [24] presented an improved VAT (iVAT), in which
a path-based distance measure is used to transform D so that the
corresponding I
(D
∗
) can describe clearly the hierarchical structure of
clusters. Havens and Bezdek[9] proposed an efficient iVAT algorithm,
of which the computational complexity is O
(N
2
). Havens and Bezdek
[10] proposed a new formulation of matrix reordering algorithm (co-
VAT) to handle all cluster problems of rectangular relational data.
In general, dissimilarity or similarity measure is one of the most
important components in a clustering algorithm. VAT-based cluster-
http://dx.doi.org/10.1016/j.patrec.2015.03.007
0167-8655/© 2015 Elsevier B.V. All rights reserved.