Data Mining: Concepts and Techniques 3rd Edition -英文原版-韩家伟

5星 · 超过95%的资源需积分: 10 175 浏览量更新于2024-07-24 收藏 14.35MB PDF 举报

"Data Mining: Concepts and Techniques 第三版英文原版" 本书是数据挖掘领域的经典之作，由Jiawei Han和Micheline Kamber合著，是Morgan Kaufmann Series in Data Management Systems系列的一部分，由Jim Gray担任系列编辑。本版为第二版，由韩家伟参与更新，提供最新的数据挖掘理论和技术。数据挖掘（Data Mining）是一种从大量数据中发现有价值知识的过程，它融合了统计学、机器学习、数据库系统等多个领域的知识。书中的主要内容可能包括： 1. 数据预处理：数据清洗、数据集成、数据转换等步骤，这些是数据挖掘的前提，确保数据质量和准确性。 2. 数据挖掘方法：书中可能涵盖了关联规则学习、聚类分析、分类、序列模式挖掘等多种技术。关联规则学习用于发现项集之间的频繁模式，如市场篮子分析；聚类分析则是将数据分组成相似群体；分类则涉及构建预测模型，如决策树、贝叶斯网络等；序列模式挖掘则关注时间序列数据中的规律。 3. 机器学习算法：书中可能会介绍监督学习和无监督学习算法，如支持向量机、神经网络、随机森林等，并讨论它们在实际应用中的优缺点。 4. 数据挖掘过程：CRISP-DM（Cross Industry Standard Process for Data Mining）等标准数据挖掘流程，包括业务理解、数据理解、数据准备、建模、评估和部署等阶段。 5. 知识表示与评估：如何将挖掘出的知识有效地表示出来，以及如何评估挖掘结果的准确性和实用性。 6. 应用实例：书中可能包含不同领域的数据挖掘应用案例，如市场营销、金融风控、医学诊断等，帮助读者理解数据挖掘在实际场景中的应用。 7. 新技术趋势：考虑到是最新版，书里可能还会涵盖大数据挖掘、云计算环境下的数据挖掘、深度学习等前沿技术。 8. 工具与平台：可能介绍一些常用的数据挖掘工具和平台，如R语言、Python的scikit-learn库、Apache Spark MLlib等。通过阅读本书，读者不仅可以深入理解数据挖掘的基本概念，还能掌握实际操作技能，从而在面对复杂数据时能够运用恰当的方法和技术进行知识发现。对于从事数据分析、数据科学、机器学习等相关工作的专业人士来说，这是一本不可多得的参考书籍。

Contents xv

7.6.2 OPTICS: Ordering Points to Identify the Clustering

Structure 420

7.6.3 DENCLUE: Clustering Based on Density

Distribution Functions 422

7.7 Grid-Based Methods 424

7.7.1 STING: STatistical INformation Grid 425

7.7.2 WaveCluster: Clustering Using Wavelet Transformation 427

7.8 Model-Based Clustering Methods 429

7.8.1 Expectation-Maximization 429

7.8.2 Conceptual Clustering 431

7.8.3 Neural Network Approach 433

7.9 Clustering High-Dimensional Data 434

7.9.1 CLIQUE: A Dimension-Growth Subspace Clustering Method 436

7.9.2 PROCLUS: A Dimension-Reduction Subspace Clustering

Method 439

7.9.3 Frequent Pattern–Based Clustering Methods 440

7.10 Constraint-Based Cluster Analysis 444

7.10.1 Clustering with Obstacle Objects 446

7.10.2 User-Constrained Cluster Analysis 448

7.10.3 Semi-Supervised Cluster Analysis 449

7.11 Outlier Analysis 451

7.11.1 Statistical Distribution-Based Outlier Detection 452

7.11.2 Distance-Based Outlier Detection 454

7.11.3 Density-Based Local Outlier Detection 455

7.11.4 Deviation-Based Outlier Detection 458

7.12 Summary 460

Exercises 461

Bibliographic Notes 464

Chapter 8 Mining Stream, Time-Series, and Sequence Data 467

8.1 Mining Data Streams 468

8.1.1 Methodologies for Stream Data Processing and

Stream Data Systems 469

8.1.2 Stream OLAP and Stream Data Cubes 474

8.1.3 Frequent-Pattern Mining in Data Streams 479

8.1.4 Classiﬁcation of Dynamic Data Streams 481

8.1.5 Clustering Evolving Data Streams 486

8.2 Mining Time-Series Data 489

8.2.1 Trend Analysis 490

8.2.2 Similarity Search in Time-Series Analysis 493

Foreword

We are deluged by data—scientiﬁc data, medical data, demographic data, ﬁnancial data,

and marketing data. People have no time to look at this data. Human attention has

become the precious resource. So, we must ﬁnd ways to automatically analyze the data,

to automatically classify it, to automatically summarize it, to automatically discover and

characterize trends in it, and to automatically ﬂag anomalies. This is one of the most

active and exciting areas of the database research community. Researchers in areas includ-

ing statistics, visualization, artiﬁcial intelligence, and machine learning are contributing

to this ﬁeld. The breadth of the ﬁeld makes it difﬁcult to grasp the extraordinary progress

over the last few decades.

Six years ago, Jiawei Han’s and Micheline Kamber’s seminal textbook organized and

presented Data Mining. It heralded a golden age of innovation in the ﬁeld. This revision

of their book reﬂects that progress; more than half of the references and historical notes

are to recent work. The ﬁeld has matured with many new and improved algorithms, and

has broadened to include many more datatypes: streams, sequences, graphs, time-series,

geospatial, audio, images, and video. We are certainly not at the end of the golden age—

indeed research and commercial interest in data mining continues to grow—but we are

all fortunate to have this modern compendium.

The book gives quick introductions to database and data mining concepts with

particular emphasis on data analysis. It then covers in a chapter-by-chapter tour the con-

cepts and techniques that underlie classiﬁcation, prediction, association, and clustering.

These topics are presented with examples, a tour of the best algorithms for each prob-

lem class, and with pragmatic rules of thumb about when to apply each technique. The

Socratic presentation style is both very readable and very informative. I certainly learned

a lot from reading the ﬁrst edition and got re-educated and updated in reading the second

edition.

Jiawei Han and Micheline Kamber have been leading contributors to data mining

research. This is the text they use with their students to bring them up to speed on the

xix

剩余771页未读，继续阅读

zeushera140

粉丝: 2

Data Mining: Concepts and Techniques 3rd Edition -英文原版-韩家伟

Data Mining: Concepts and Techniques - Second Edition

Data Mining: Concepts and Techniques 解决手册

数据挖掘入门：韩家炜《Data Mining: Concepts and Techniques》概览

Data Mining Concepts and Techniques

DataMining Concepts And Techniques

Data Mining Concepts and Techniques 2nd edition

Data Mining Concepts and Techniques 3rd Edition

Data Mining Concepts and Techniques.pdf

Data Mining Concepts and Techniques 3rd Ed

Data Mining Concepts and Techniques(2nd)

最新资源