Bradley Efron的《计算机时代统计推断》:前沿算法与数据科学探索

需积分: 9 2 下载量 87 浏览量 更新于2024-07-18 1 收藏 8.12MB PDF 举报
《计算机时代的统计推断》是由著名统计学家、Bootstrap的发明者布拉德利·埃弗龙所著的一本深度探讨现代统计学前沿的书籍。埃弗龙以其通俗易懂的写作风格,带领读者穿越复杂的统计理论,将过去几十年里最尖端的研究成果娓娓道来。这本书不仅适合对统计学感兴趣的专业人士,也欢迎对数据分析和技术感兴趣的非专业人士阅读,因为作者巧妙地避免了过于数学化的表述,而是以讲故事的方式阐述复杂的统计算法和推理方法。 在《计算机时代的统计推断》中,章节"算法与推断"首先概述了统计分析的基本原理,包括回归分析,这是一种用于探索变量间关系的重要工具。这部分内容深入探讨了最小二乘法和线性模型,以及如何通过这些方法从数据中提取有价值的信息。书中还提及了贝叶斯统计,这是一种基于概率的统计框架,它强调先验知识在推断过程中的作用,为现代机器学习和人工智能提供了坚实的基础。 接下来,本书涉及了时间序列分析,这是处理随时间变化的数据序列的方法,如经济指标或天气模式。此外,抽样理论、假设检验、置信区间估计等经典统计概念也在书中得到了详尽的解释,这些都是确保数据可靠性和结论准确性的基石。 书中特别提到了蒙特卡洛方法,这是一种通过模拟随机过程来解决复杂问题的数值计算技术,常用于估计难以直接计算的积分或概率。此外,贝叶斯网络和神经网络在现代统计推断中的应用也得到了介绍,它们结合了概率论和人工智能,是数据科学领域的重要组成部分。 《计算机时代的统计推断》并非仅限于理论,还涵盖了各种实用的算法和方法,如决策树、随机森林以及支持向量机等,这些都是现代机器学习算法的核心。作者通过实例演示了如何把这些算法应用到实际问题中,让读者更好地理解它们的工作原理和优势。 整本书的风格既严谨又易懂,以 Bradley Efron 和 Trevor Hastie 合著的《计算机时代统计》(Computer Age Statistical Inference)为基础,该书首次由剑桥大学出版社出版,进一步扩展了他们在《算法、证据与数据科学》(Algorithms, Evidence, and Data Science)中的研究成果。如果你想深入了解统计推断的最新进展,这本书无疑是一个理想的入门指南。 请注意,本书的电子版可以在剑桥大学出版社的官方网站上购买,同时也可以通过常规渠道获取。但必须遵守版权规定,仅供个人使用,并禁止改编、销售或再分发。最后,本摘要信息基于2017年2月24日的修订版,确保了内容的准确性。
2017-06-03 上传
1 Algorithms and Inference 3 1.1 A Regression Example 4 1.2 Hypothesis Testing 8 1.3 Notes 11 2 Frequentist Inference 12 2.1 Frequentism in Practice 14 2.2 Frequentist Optimality 18 2.3 Notes and Details 20 3 Bayesian Inference 22 3.1 Two Examples 24 3.2 Uninformative Prior Distributions 28 3.3 Flaws in Frequentist Inference 30 3.4 A Bayesian/Frequentist Comparison List 33 3.5 Notes and Details 36 4 Fisherian Inference and Maximum Likelihood Estimation 38 4.1 Likelihood and Maximum Likelihood 38 4.2 Fisher Information and the MLE 41 4.3 Conditional Inference 45 4.4 Permutation and Randomization 49 4.5 Notes and Details 51 5 Parametric Models and Exponential Families 53 ix x Contents 5.1 Univariate Families 54 5.2 The Multivariate Normal Distribution 55 5.3 Fisher’s Information Bound for Multiparameter Families 59 5.4 The Multinomial Distribution 61 5.5 Exponential Families 64 5.6 Notes and Details 69 Part II Early Computer-Age Methods 73 6 Empirical Bayes 75 6.1 Robbins’ Formula 75 6.2 The Missing-Species Problem 78 6.3 A Medical Example 84 6.4 Indirect Evidence 1 88 6.5 Notes and Details 88 7 James–Stein Estimation and Ridge Regression 91 7.1 The James–Stein Estimator 91 7.2 The Baseball Players 94 7.3 Ridge Regression 97 7.4 Indirect Evidence 2 102 7.5 Notes and Details 104 8 Generalized Linear Models and Regression Trees 108 8.1 Logistic Regression 109 8.2 Generalized Linear Models 116 8.3 Poisson Regression 120 8.4 Regression Trees 124 8.5 Notes and Details 128 9 Survival Analysis and the EM Algorithm 131 9.1 Life Tables and Hazard Rates 131 9.2 Censored Data and the Kaplan–Meier Estimate 134 9.3 The Log-Rank Test 139 9.4 The Proportional Hazards Model 143 9.5 Missing Data and the EM Algorithm 146 9.6 Notes and Details 150 10 The Jackknife and the Bootstrap 155 10.1 The Jackknife Estimate of Standard Error 156 10.2 The Nonparametric Bootstrap 159 10.3 Resampling Plans 162 Contents xi 10.4 The Parametric Bootstrap 169 10.5 Influence Functions and Robust Estimation 174 10.6 Notes and Details 177 11 Bootstrap Confidence Intervals 181 11.1 Neyman’s Construction for One-Parameter Problems 181 11.2 The Percentile Method 185 11.3 Bias-Corrected Confidence Intervals 190 11.4 Second-Order Accuracy 192 11.5 Bootstrap-t Intervals 195 11.6 Objective Bayes Intervals and the Confidence Distribution 198 11.7 Notes and Details 204 12 Cross-Validation and Cp Estimates of Prediction Error 208 12.1 Prediction Rules 208 12.2 Cross-Validation 213 12.3 Covariance Penalties 218 12.4 Training, Validation, and Ephemeral Predictors 227 12.5 Notes and Details 230 13 Objective Bayes Inference and MCMC 233 13.1 Objective Prior Distributions 234 13.2 Conjugate Prior Distributions 237 13.3 Model Selection and the Bayesian Information Criterion 243 13.4 Gibbs Sampling and MCMC 251 13.5 Example: Modeling Population Admixture 256 13.6 Notes and Details 261 14 Postwar Statistical Inference and Methodology 264 Part III Twenty-First-Century Topics 269 15 Large-Scale Hypothesis Testing and FDRs 271 15.1 Large-Scale Testing 272 15.2 False-Discovery Rates 275 15.3 Empirical Bayes Large-Scale Testing 278 15.4 Local False-Discovery Rates 282 15.5 Choice of the Null Distribution 286 15.6 Relevance 290 15.7 Notes and Details 294 16 Sparse Modeling and the Lasso 298 xii Contents 16.1 Forward Stepwise Regression 299 16.2 The Lasso 303 16.3 Fitting Lasso Models 308 16.4 Least-Angle Regression 309 16.5 Fitting Generalized Lasso Models 313 16.6 Post-Selection Inference for the Lasso 317 16.7 Connections and Extensions 319 16.8 Notes and Details 321 17 Random Forests and Boosting 324 17.1 Random Forests 325 17.2 Boosting with Squared-Error Loss 333 17.3 Gradient Boosting 338 17.4 Adaboost: the Original Boosting Algorithm 341 17.5 Connections and Extensions 345 17.6 Notes and Details 347 18 Neural Networks and Deep Learning 351 18.1 Neural Networks and the Handwritten Digit Problem 353 18.2 Fitting a Neural Network 356 18.3 Autoencoders 362 18.4 Deep Learning 364 18.5 Learning a Deep Network 368 18.6 Notes and Details 371 19 Support-Vector Machines and Kernel Methods 375 19.1 Optimal Separating Hyperplane 376 19.2 Soft-Margin Classifier 378 19.3 SVM Criterion as Loss Plus Penalty 379 19.4 Computations and the Kernel Trick 381 19.5 Function Fitting Using Kernels 384 19.6 Example: String Kernels for Protein Classification 385 19.7 SVMs: Concluding Remarks 387 19.8 Kernel Smoothing and Local Regression 387 19.9 Notes and Details 390 20 Inference After Model Selection 394 20.1 Simultaneous Confidence Intervals 395 20.2 Accuracy After Model Selection 402 20.3 Selection Bias 408 20.4 Combined Bayes–Frequentist Estimation 412 20.5 Notes and Details 417 Contents xiii 21 Empirical Bayes Estimation Strategies 421 21.1 Bayes Deconvolution 421 21.2 g-Modeling and Estimation 424 21.3 Likelihood, Regularization, and Accuracy 427 21.4 Two Examples 432 21.5 Generalized Linear Mixed Models 437 21.6 Deconvolution and f -Modeling 440 21.7 Notes and Details 444 Epilogue 446 References 453 Author Index 463 Subject Index 467