没有合适的资源?快使用搜索试试~ 我知道了~
首页统计学习基础第二版:数据驱动的机器学习经典
统计学习基础第二版:数据驱动的机器学习经典
需积分: 3 1 下载量 68 浏览量
更新于2024-07-21
收藏 12.2MB PDF 举报
"《统计学习要素》第二版是机器学习领域的一本经典著作,由Hastie、Tibshirani和Friedman三位作者撰写。该书在第一版大受欢迎的基础上,随着研究领域的快速发展,作者们决定推出第二版以反映最新的研究成果。新版本共增加了四章,并对原有章节进行了更新,以保持内容的时效性。 新版本的主要变化包括: 1. 在引言部分,作者引用了威廉·爱德华兹·戴明的名言:“我们信任上帝,其他人则带来数据。”尽管这个名言在网络上传播广泛,但据Hayden教授所述,他并未原创此言,而关于戴明是否真的说过这句话的“数据”证据难以寻觅,这体现了统计学习领域研究中对数据可靠性的重视。 2. 新增的章节涵盖了全新的主题,这些内容反映了近年来统计学习理论和技术的扩展。例如,可能有章节探讨了深度学习、大数据分析、模型选择与交叉验证的新进展,以及在高维数据处理中的新颖方法。 3. 对于已有的章节,作者可能重新审视并整合了最新的研究成果,确保内容的精确性和实用性。比如,章节可能更新了特征选择和降维技术,介绍了更高效的算法和模型优化策略。 4. 为了保持读者的阅读流畅性,尽管第二版有所扩展,但作者尽量保持原有的结构框架不变,仅在必要处进行调整,以便让熟悉第一版的读者能快速定位和理解新内容。 5. 此次出版不仅是一次技术的更新,也可能是对教学方法的反思,强调了统计学习理论在实际应用中的重要性,以及如何将理论知识转化为解决现实问题的能力。 《统计学习要素》第二版是对第一版的有益补充,旨在帮助读者紧跟机器学习领域的前沿动态,提升理解和应用统计学习技术的水平。无论你是初学者还是专业人士,这本书都是深入理解复杂数据分析方法和模型构建的宝贵资源。"
资源详情
资源推荐
xx Contents
14 Unsupervised Learning 485
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 485
14.2 Association Rules . . . . . . . . . . . . . . . . . . . . . . 487
14.2.1 Market Basket Analysis . . . . . . . . . . . . . . 488
14.2.2 The Apriori Algorithm . . . . . . . . . . . . . . 489
14.2.3 Example: Market Basket Analysis . . . . . . . . 492
14.2.4 Unsupervised as Supervised Learning . . . . . . 495
14.2.5 Generalized Association Rules . . . . . . . . . . 497
14.2.6 Choice of Supervised Learning Method . . . . . 499
14.2.7 Example: Market Basket Analysis (Continued) . 499
14.3 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . 501
14.3.1 Proximity Matrices . . . . . . . . . . . . . . . . 503
14.3.2 Dissimilarities Based on Attributes . . . . . . . 503
14.3.3 Object Dissimilarity . . . . . . . . . . . . . . . . 505
14.3.4 Clustering Algorithms . . . . . . . . . . . . . . . 507
14.3.5 Combinatorial Algorithms . . . . . . . . . . . . 507
14.3.6 K-means . . . . . . . . . . . . . . . . . . . . . . 509
14.3.7 Gaussian Mixtures as Soft K-means Clustering . 510
14.3.8 Example: Human Tumor Microarray Data . . . 512
14.3.9 Vector Quantization . . . . . . . . . . . . . . . . 514
14.3.10 K-medoids . . . . . . . . . . . . . . . . . . . . . 515
14.3.11 Practical Issues . . . . . . . . . . . . . . . . . . 518
14.3.12 Hierarchical Clustering . . . . . . . . . . . . . . 520
14.4 Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . 528
14.5 Principal Components, Curves and Surfaces . . . . . . . . 534
14.5.1 Principal Components . . . . . . . . . . . . . . . 534
14.5.2 Principal Curves and Surfaces . . . . . . . . . . 541
14.5.3 Spectral Clustering . . . . . . . . . . . . . . . . 544
14.5.4 Kernel Principal Components . . . . . . . . . . . 547
14.5.5 Sparse Principal Components . . . . . . . . . . . 550
14.6 Non-negative Matrix Factorization . . . . . . . . . . . . . 553
14.6.1 Archetypal Analysis . . . . . . . . . . . . . . . . 554
14.7 Independent Component Analysis
and Exploratory Projection Pursuit . . . . . . . . . . . . 557
14.7.1 Latent Variables and Factor Analysis . . . . . . 558
14.7.2 Independent Component Analysis . . . . . . . . 560
14.7.3 Exploratory Projection Pursuit . . . . . . . . . . 565
14.7.4 A Direct Approach to ICA . . . . . . . . . . . . 565
14.8 Multidimensional Scaling . . . . . . . . . . . . . . . . . . 570
14.9 Nonlinear Dimension Reduction
and Local Multidimensional Scaling . . . . . . . . . . . . 572
14.10 The Google PageRank Algorithm . . . . . . . . . . . . . 576
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 578
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
Contents xxi
15 Random Forests 587
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 587
15.2 Definition of Random Forests . . . . . . . . . . . . . . . . 587
15.3 Details of Random Forests . . . . . . . . . . . . . . . . . 592
15.3.1 Out of Bag Samples . . . . . . . . . . . . . . . . 592
15.3.2 Variable Importance . . . . . . . . . . . . . . . . 593
15.3.3 Proximity Plots . . . . . . . . . . . . . . . . . . 595
15.3.4 Random Forests and Overfitting . . . . . . . . . 596
15.4 Analysis of Random Forests . . . . . . . . . . . . . . . . . 597
15.4.1 Variance and the De-Correlation Effect . . . . . 597
15.4.2 Bias . . . . . . . . . . . . . . . . . . . . . . . . . 600
15.4.3 Adaptive Nearest Neighbors . . . . . . . . . . . 601
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 602
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
16 Ensemble Learning 605
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 605
16.2 Boosting and Regularization Paths . . . . . . . . . . . . . 607
16.2.1 Penalized Regression . . . . . . . . . . . . . . . 607
16.2.2 The “Bet on Sparsity” Principle . . . . . . . . . 610
16.2.3 Regularization Paths, Over-fitting and Margins . 613
16.3 Learning Ensembles . . . . . . . . . . . . . . . . . . . . . 616
16.3.1 Learning a Good Ensemble . . . . . . . . . . . . 617
16.3.2 Rule Ensembles . . . . . . . . . . . . . . . . . . 622
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 623
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
17 Undirected Graphical Models 625
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 625
17.2 Markov Graphs and Their Properties . . . . . . . . . . . 627
17.3 Undirected Graphical Models for Continuous Variables . 630
17.3.1 Estimation of the Parameters
when the Graph Structure is Known . . . . . . . 631
17.3.2 Estimation of the Graph Structure . . . . . . . . 635
17.4 Undirected Graphical Models for Discrete Variables . . . 638
17.4.1 Estimation of the Parameters
when the Graph Structure is Known . . . . . . . 639
17.4.2 Hidden Nodes . . . . . . . . . . . . . . . . . . . 641
17.4.3 Estimation of the Graph Structure . . . . . . . . 642
17.4.4 Restricted Boltzmann Machines . . . . . . . . . 643
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
18 High-Dimensional Problems: p ≫ N 649
18.1 When p is Much Bigger than N . . . . . . . . . . . . . . 649
xxii Contents
18.2 Diagonal Linear Discriminant Analysis
and Nearest Shrunken Centroids . . . . . . . . . . . . . . 651
18.3 Linear Classifiers with Quadratic Regularization . . . . . 654
18.3.1 Regularized Discriminant Analysis . . . . . . . . 656
18.3.2 Logistic Regression
with Quadratic Regularization . . . . . . . . . . 657
18.3.3 The Support Vector Classifier . . . . . . . . . . 657
18.3.4 Feature Selection . . . . . . . . . . . . . . . . . . 658
18.3.5 Computational Shortcuts When p ≫ N . . . . . 659
18.4 Linear Classifiers with L
1
Regularization . . . . . . . . . 661
18.4.1 Application of Lasso
to Protein Mass Spectroscopy . . . . . . . . . . 664
18.4.2 The Fused Lasso for Functional Data . . . . . . 666
18.5 Classification When Features are Unavailable . . . . . . . 668
18.5.1 Example: String Kernels
and Protein Classification . . . . . . . . . . . . . 668
18.5.2 Classification and Other Models Using
Inner-Product Kernels and Pairwise Distances . 670
18.5.3 Example: Abstracts Classification . . . . . . . . 672
18.6 High-Dimensional Regression:
Supervised Principal Components . . . . . . . . . . . . . 674
18.6.1 Connection to Latent-Variable Modeling . . . . 678
18.6.2 Relationship with Partial Least Squares . . . . . 680
18.6.3 Pre-Conditioning for Feature Selection . . . . . 681
18.7 Feature Assessment and the Multiple-Testing Problem . . 683
18.7.1 The False Discovery Rate . . . . . . . . . . . . . 687
18.7.2 Asymmetric Cutpoints and the SAM Procedure 690
18.7.3 A Bayesian Interpretation of the FDR . . . . . . 692
18.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . 693
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694
References 699
Author Index 729
Index 737
This is page 1
Printer: Opaque this
1
Introduction
Statistical learning plays a key role in many areas of science, finance and
industry. Here are some examples of learning problems:
• Predict whether a patient, hospitalized due to a heart attack, will
have a second heart attack. The prediction is to be based on demo-
graphic, diet and clinical measurements for that patient.
• Predict the price of a stock in 6 months from now, on the basis of
company performance measures and economic data.
• Identify the numbers in a handwritten ZIP code, from a digitized
image.
• Estimate the amount of glucose in the blood of a diabetic person,
from the infrared absorption spectrum of that person’s blood.
• Identify the risk factors for prostate cancer, based on clinical and
demographic variables.
The science of learning plays a key role in the fields of statistics, data
mining and artificial intelligence, intersecting with areas of engineering and
other disciplines.
This book is about learning from data. In a typical scenario, we have
an outcome measurement, usually quantitative (such as a stock price) or
categorical (such as heart attack/no heart attack), that we wish to predict
based on a set of features (such as diet and clinical measurements). We
have a training set of data, in which we observe the outcome and feature
2 1. Introduction
TABLE 1.1. Average percentage of words or characters in an email message
equal to the indicated word or character. We have chosen the words and characters
showing the largest difference between spam and email.
george you your hp free hpl ! our re edu remove
spam 0.00 2.26 1.38 0.02 0.52 0.01 0.51 0.51 0.13 0.01 0.28
email
1.27 1.27 0.44 0.90 0.07 0.43 0.11 0.18 0.42 0.29 0.01
measurements for a set of objects (such as people). Using this data we build
a prediction model, or learner, which will enable us to predict the outcome
for new unseen objects. A good learner is one that accurately predicts such
an outcome.
The examples above describe what is called the supervised learning prob-
lem. It is called “supervised” because of the presence of the outcome vari-
able to guide the learning process. In the unsupervised learning problem,
we observe only the features and have no measurements of the outcome.
Our task is rather to describe how the data are organized or clustered. We
devote most of this book to supervised learning; the unsupervised problem
is less developed in the literature, and is the focus of Chapter 14.
Here are some examples of real learning problems that are discussed in
this book.
Example 1: Email Spam
The data for this example consists of information from 4601 email mes-
sages, in a study to try to predict whether the email was junk email, or
“spam.” The objective was to design an automatic spam detector that
could filter out spam before clogging the users’ mailboxes. For all 4601
email messages, the true outcome (email type) email or spam is available,
along with the relative frequencies of 57 of the most commonly occurring
words and punctuation marks in the email message. This is a supervised
learning problem, with the outcome the class variable email/spam. It is also
called a classification problem.
Table 1.1 lists the words and characters showing the largest average
difference between spam and email.
Our learning method has to decide which features to use and how: for
example, we might use a rule such as
if (%george < 0.6) & (%you > 1.5) then spam
else email.
Another form of a rule might be:
if (0.2 · %you − 0.3 · %george) > 0 then spam
else email.
剩余762页未读,继续阅读
denny2015
- 粉丝: 131
- 资源: 12
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 十种常见电感线圈电感量计算公式详解
- 军用车辆:CAN总线的集成与优势
- CAN总线在汽车智能换档系统中的作用与实现
- CAN总线数据超载问题及解决策略
- 汽车车身系统CAN总线设计与应用
- SAP企业需求深度剖析:财务会计与供应链的关键流程与改进策略
- CAN总线在发动机电控系统中的通信设计实践
- Spring与iBATIS整合:快速开发与比较分析
- CAN总线驱动的整车管理系统硬件设计详解
- CAN总线通讯智能节点设计与实现
- DSP实现电动汽车CAN总线通讯技术
- CAN协议网关设计:自动位速率检测与互连
- Xcode免证书调试iPad程序开发指南
- 分布式数据库查询优化算法探讨
- Win7安装VC++6.0完全指南:解决兼容性与Office冲突
- MFC实现学生信息管理系统:登录与数据库操作
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功