xx Preface
Chapter 3. Scale conversion. Scale conversion is concerned with the transformation
between different types of variables. For example, one may convert a continuous measured
variable to an interval variable. In this chapter, we first review several scale conversion
techniques and then discuss several approaches for categorizing numerical data.
Chapter 4. Data standardization and transformation. In many situations, raw data
should be normalized and/or transformed before a cluster analysis. One reason to do this is
that objects in raw data may be described by variables measured with different scales; another
reason is to reduce the size of the data to improve the effectiveness of clustering algorithms.
Therefore, we present several data standardization and transformation techniques in this
chapter.
Chapter 5. Data visualization. Data visualization is vital in the final step of data-
mining applications. This chapter introduces various techniques of visualization with an
emphasis on visualization of clustered data. Some dimension reduction techniques, such as
multidimensional scaling (MDS) and self-organizing maps (SDMs), are discussed.
Chapter 6. Similarity and dissimilarity measures. In the literature of data clus-
tering, a similarity measure or distance (dissimilarity measure) is used to quantitatively
describe the similarity or dissimilarity of two data points or two clusters. Similarity and dis-
tance measures are basic elements of a clustering algorithm, without which no meaningful
cluster analysis is possible. Due to the important role of similarity and distance measures in
cluster analysis, we present a comprehensive discussion of different measures for various
types of data in this chapter. We also introduce measures between points and measures
between clusters.
Chapter 7. Hierarchical clustering techniques. Hierarchical clustering algorithms
and partitioning algorithms are two major clustering algorithms. Unlike partitioning algo-
rithms, which divide a data set into a single partition, hierarchical algorithms divide a data
set into a sequence of nested partitions. There are two major hierarchical algorithms: ag-
glomerative algorithms and divisive algorithms. Agglomerative algorithms start with every
single object in a single cluster, while divisive ones start with all objects in one cluster and
repeat splitting large clusters into small pieces. In this chapter, we present representations
of hierarchical clustering and several popular hierarchical clustering algorithms.
Chapter 8. Fuzzy clustering algorithms. Clustering algorithms can be classified
into two categories: hard clustering algorithms and fuzzy clustering algorithms. Unlike
hard clustering algorithms, which require that each data point of the data set belong to one
and only one cluster, fuzzy clustering algorithms allow a data point to belong to two or
more clusters with different probabilities. There is also a huge number of published works
related to fuzzy clustering. In this chapter, we review some basic concepts of fuzzy logic
and present three well-known fuzzy clustering algorithms: fuzzy k-means, fuzzy k-modes,
and c-means.
Chapter 9. Center-based clustering algorithms. Compared to other types of clus-
tering algorithms, center-based clustering algorithms are more suitable for clustering large
data sets and high-dimensional data sets. Several well-known center-based clustering algo-
rithms (e.g., k-means, k-modes) are presented and discussed in this chapter.
Chapter 10. Search-based clustering algorithms. A well-known problem associ-
ated with most of the clustering algorithms is that they may not be able to find the globally
optimal clustering that fits the data set, since these algorithms will stop if they find a local
optimal partition of the data set. This problem led to the invention of search-based clus-