没有合适的资源?快使用搜索试试~ 我知道了~
首页Data Clustering Theory, Algorithms, and Applications
Data Clustering Theory, Algorithms, and Applications

Guojun Gan York University Toronto, Ontario, Canada Chaoqun Ma Hunan University Changsha, Hunan, People’s Republic of China Jianhong Wu York University Toronto, Ontario, Canada
资源详情
资源评论
资源推荐

Data Clustering
SA20_GanMaWu fm 1.qxp 4/9/2007 9:57 AM Page i

ASA-SIAM Series on
Statistics and Applied Probability
The ASA-SIAM Series on Statistics and Applied Probability is published
jointly by the American Statistical Association and the Society for Industrial and Applied Mathematics.
The series consists of a broad spectrum of books on topics in statistics and applied probability. The
purpose of the series is to provide inexpensive, quality publications of interest to the intersecting
membership of the two societies.
Editorial Board
Martin T. Wells
Cornell University, Editor-in-Chief
H. T. Banks
North Carolina State University
Douglas M. Hawkins
University of Minnesota
Susan Holmes
Stanford University
Gan, G., Ma, C., and Wu, J., Data Clustering: Theory, Algorithms, and Applications
Hubert, L., Arabie, P., and Meulman, J., The Structural Representation of Proximity Matrices with MATLAB
Nelson, P. R., Wludyka, P. S., and Copeland, K. A. F., The Analysis of Means: A Graphical Method for
Comparing Means, Rates, and Proportions
Burdick, R. K., Borror, C. M., and Montgomery, D. C., Design and Analysis of Gauge R&R Studies: Making
Decisions with Confidence Intervals in Random and Mixed ANOVA Models
Albert, J., Bennett, J., and Cochran, J. J., eds., Anthology of Statistics in Sports
Smith, W. F., Experimental Design for Formulation
Baglivo, J. A., Mathematica Laboratories for Mathematical Statistics: Emphasizing Simulation and
Computer Intensive Methods
Lee, H. K. H., Bayesian Nonparametrics via Neural Networks
O’Gorman, T. W., Applied Adaptive Statistical Methods: Tests of Significance and Confidence Intervals
Ross, T. J., Booker, J. M., and Parkinson, W. J., eds., Fuzzy Logic and Probability Applications: Bridging the Gap
Nelson, W. B., Recurrent Events Data Analysis for Product Repairs, Disease Recurrences, and Other
Applications
Mason, R. L. and Young, J. C., Multivariate Statistical Process Control with Industrial Applications
Smith, P. L., A Primer for Sampling Solids, Liquids, and Gases: Based on the Seven Sampling Errors of
Pierre Gy
Meyer, M. A. and Booker, J. M., Eliciting and Analyzing Expert Judgment: A Practical Guide
Latouche, G. and Ramaswami, V., Introduction to Matrix Analytic Methods in Stochastic Modeling
Peck, R., Haugh, L., and Goodman, A., Statistical Case Studies: A Collaboration Between Academe and
Industry, Student Edition
Peck, R., Haugh, L., and Goodman, A., Statistical Case Studies: A Collaboration Between Academe and
Industry
Barlow, R., Engineering Reliability
Czitrom, V. and Spagon, P. D., Statistical Case Studies for Industrial Process Improvement
Lisa LaVange
University of North Carolina
David Madigan
Rutgers University
Mark van der Laan
University of California, Berkeley
SA20_GanMaWu fm 1.qxp 4/9/2007 9:57 AM Page ii

Society for Industrial and Applied Mathematics
Philadelphia, Pennsylvania
American Statistical Association
Alexandria, Virginia
Data Clustering
Theory, Algorithms,
and Applications
Guojun Gan
York University
Toronto, Ontario, Canada
Chaoqun Ma
Hunan University
Changsha, Hunan, People’s Republic of China
Jianhong Wu
York University
Toronto, Ontario, Canada
SA20_GanMaWu fm 1.qxp 4/9/2007 9:57 AM Page iii

The correct bibliographic citation for this book is as follows: Gan, Guojun, Chaoqun Ma, and Jianhong
Wu, Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM Series on Statistics and Applied
Probability, SIAM, Philadelphia, ASA, Alexandria, VA, 2007.
Copyright © 2007 by the American Statistical Association and the Society for Industrial and Applied
Mathematics.
10 9 8 7 6 5 4 3 2 1
All rights reserved. Printed in the United States of America. No part of this book may be reproduced,
stored, or transmitted in any manner without the written permission of the publisher. For information,
write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center,
Philadelphia, PA 19104-2688.
Trademarked names may be used in this book without the inclusion of a trademark symbol. These
names are intended in an editorial context only; no infringement of trademark is intended.
Library of Congress Cataloging-in-Publication Data
Gan, Guojun, 1979-
Data clustering : theory, algorithms, and applications / Guojun Gan, Chaoqun Ma,
Jianhong Wu.
p. cm. – (ASA-SIAM series on statistics and applied probability ; 20)
Includes bibliographical references and index.
ISBN: 978-0-898716-23-8 (alk. paper)
1. Cluster analysis. 2. Cluster analysis—Data processing. I. Ma, Chaoqun, Ph.D. II.
Wu, Jianhong. III. Title.
QA278.G355 2007
519.5’3—dc22
2007061713
is a registered trademark.
SA20_GanMaWu fm 1.qxp 4/9/2007 9:57 AM Page iv

Contents
List of Figures xiii
List of Tables xv
List of Algorithms xvii
Preface xix
I Clustering, Data, and Similarity Measures 1
1 Data Clustering 3
1.1 Definition of Data Clustering ...................... 3
1.2 The Vocabulary of Clustering ...................... 5
1.2.1 Records and Attributes .................... 5
1.2.2 Distances and Similarities .................. 5
1.2.3 Clusters, Centers, and Modes ................. 6
1.2.4 Hard Clustering and Fuzzy Clustering ............ 7
1.2.5 Validity Indices ........................ 8
1.3 Clustering Processes ........................... 8
1.4 Dealing with Missing Values ...................... 10
1.5 Resources for Clustering ......................... 12
1.5.1 Surveys and Reviews on Clustering ............. 12
1.5.2 Books on Clustering ..................... 12
1.5.3 Journals ............................ 13
1.5.4 Conference Proceedings ................... 15
1.5.5 Data Sets ........................... 17
1.6 Summary ................................. 17
2 Data Types 19
2.1 Categorical Data ............................. 19
2.2 Binary Data ............................... 21
2.3 Transaction Data ............................. 23
2.4 Symbolic Data .............................. 23
2.5 Time Series ............................... 24
2.6 Summary ................................. 24
v
剩余487页未读,继续阅读














安全验证
文档复制为VIP权益,开通VIP直接复制

评论4