xx Contents
14 Unsupervised Learning 485
14.1 Introduction ......................... 485
14.2 Association Rules ...................... 487
14.2.1 Market Basket Analysis .............. 488
14.2.2 The Apriori Algorithm .............. 489
14.2.3 Example: Market Basket Analysis . ....... 492
14.2.4 Unsupervised as Supervised Learning ...... 495
14.2.5 Generalized Association Rules . . . ....... 497
14.2.6 Choice of Supervised Learning Method ..... 499
14.2.7 Example: Market Basket Analysis (Continued) . 499
14.3 Cluster Analysis ....................... 501
14.3.1 Proximity Matrices ................ 503
14.3.2 Dissimilarities Based on Attributes ....... 503
14.3.3 Object Dissimilarity ................ 505
14.3.4 Clustering Algorithms ............... 507
14.3.5 Combinatorial Algorithms ............ 507
14.3.6 K-means ...................... 509
14.3.7 Gaussian Mixtures as Soft K-means Clustering . 510
14.3.8 Example: Human Tumor Microarray Data . . . 512
14.3.9 Vector Quantization ................ 514
14.3.10 K-medoids ..................... 515
14.3.11 Practical Issues .................. 518
14.3.12 Hierarchical Clustering .............. 520
14.4 Self-Organizing Maps .................... 528
14.5 Principal Components, Curves and Surfaces . ....... 534
14.5.1 Principal Components ............... 534
14.5.2 Principal Curves and Surfaces . . . ....... 541
14.5.3 Spectral Clustering ................ 544
14.5.4 Kernel Principal Components ........... 547
14.5.5 Sparse Principal Components ........... 550
14.6 Non-negative Matrix Factorization ............. 553
14.6.1 Archetypal Analysis ................ 554
14.7 Independent Component Analysis
and Exploratory Projection Pursuit ............ 557
14.7.1 Latent Variables and Factor Analysis ...... 558
14.7.2 Independent Component Analysis . ....... 560
14.7.3 Exploratory Projection Pursuit . . . ....... 565
14.7.4 A Direct Approach to ICA ............ 565
14.8 Multidimensional Scaling .................. 570
14.9 Nonlinear Dimension Reduction
and Local Multidimensional Scaling ............ 572
14.10 The Google PageRank Algorithm ............. 576
Bibliographic Notes ......................... 578
Exercises............................... 579