xx Contents
14 Unsupervised Learning 485
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 485
14.2 Association Rules . . . . . . . . . . . . . . . . . . . . . . 487
14.2.1 Market Basket Analysis . . . . . . . . . . . . . . 488
14.2.2 The Apriori Algorithm . . . . . . . . . . . . . . 489
14.2.3 Example: Market Basket Analysis . . . . . . . . 492
14.2.4 Unsupervised as Supervised Learning . . . . . . 495
14.2.5 Generalized Association Rules . . . . . . . . . . 497
14.2.6 Choice of Supervised Learning Method . . . . . 499
14.2.7 Example: Market Basket Analysis (Continued) . 499
14.3 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . 501
14.3.1 Proximity Matrices . . . . . . . . . . . . . . . . 503
14.3.2 Dissimilarities Based on Attributes . . . . . . . 503
14.3.3 Object Dissimilarity . . . . . . . . . . . . . . . . 505
14.3.4 Clustering Algorithms . . . . . . . . . . . . . . . 507
14.3.5 Combinatorial Algorithms . . . . . . . . . . . . 507
14.3.6 K-means . . . . . . . . . . . . . . . . . . . . . . 509
14.3.7 Gaussian Mixtures as Soft K-means Clustering . 510
14.3.8 Example: Human Tumor Microarray Data . . . 512
14.3.9 Vector Quantization . . . . . . . . . . . . . . . . 514
14.3.10 K-medoids . . . . . . . . . . . . . . . . . . . . . 515
14.3.11 Practical Issues . . . . . . . . . . . . . . . . . . 518
14.3.12 Hierarchical Clustering . . . . . . . . . . . . . . 520
14.4 Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . 528
14.5 Principal Components, Curves and Surfaces . . . . . . . . 534
14.5.1 Principal Components . . . . . . . . . . . . . . . 534
14.5.2 Principal Curves and Surfaces . . . . . . . . . . 541
14.5.3 Spectral Clustering . . . . . . . . . . . . . . . . 544
14.5.4 Kernel Principal Components . . . . . . . . . . . 547
14.5.5 Sparse Principal Components . . . . . . . . . . . 550
14.6 Non-negative Matrix Factorization . . . . . . . . . . . . . 553
14.6.1 Archetypal Analysis . . . . . . . . . . . . . . . . 554
14.7 Independent Component Analysis
and Exploratory Projection Pursuit . . . . . . . . . . . . 557
14.7.1 Latent Variables and Factor Analysis . . . . . . 558
14.7.2 Independent Component Analysis . . . . . . . . 560
14.7.3 Exploratory Projection Pursuit . . . . . . . . . . 565
14.7.4 A Direct Approach to ICA . . . . . . . . . . . . 565
14.8 Multidimensional Scaling . . . . . . . . . . . . . . . . . . 570
14.9 Nonlinear Dimension Reduction
and Local Multidimensional Scaling . . . . . . . . . . . . 572
14.10 The Google PageRank Algorithm . . . . . . . . . . . . . 576
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 578
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579