【干货书】《因果推理导论-机器学习角度》，132页pdf_机器学习导论(原书第2版)pdf

机器学习

需积分: 41 108 浏览量更新于2023-04-28 评论 8 收藏 979KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Course Lecture Notes

Introduction to Causal Inference

from a Machine Learning Perspective

Brady Neal

December 17, 2020

Preface

Prerequisites

There is one main prerequisite:

basic probability

. This course assumes

you’ve taken an introduction to probability course or have had equivalent experience.

Topics from statistics and machine learning will pop up in the course from time to

time, so some familiarity with those will be helpful but is not necessary. For example, if

cross-validation is a new concept to you, you can learn it relatively quickly at the point in

the book that it pops up. And we give a primer on some statistics terminology that we’ll

use in Section 2.4.

Active Reading Exercises

Research shows that one of the best techniques to remember

material is to actively try to recall information that you recently learned. You will see

“active reading exercises” throughout the book to help you do this. They’ll be marked by

the Active reading exercise: heading.

Many Figures in This Book

As you will see, there are a ridiculous amount of ﬁgures in

this book. This is on purpose. This is to help give you as much visual intuition as possible.

We will sometimes copy the same ﬁgures, equations, etc. that you might have seen in

preceding chapters so that we can make sure the ﬁgures are always right next to the text

that references them.

Sending Me Feedback

This is a book draft, so I greatly appreciate any feedback you’re

willing to send my way. If you’re unsure whether I’ll be receptive to it or not, don’t be.

Please send any feedback to me at bradyneal11@gmail.com with “[Causal Book]” in the

beginning of your email subject. Feedback can be at the word level, sentence level, section

level, chapter level, etc. Here’s a non-exhaustive list of useful kinds of feedback:

I Typoz.

I Some part is confusing.

You notice your mind starts to wander, or you don’t feel motivated to read some

part.

I Some part seems like it can be cut.

I You feel strongly that some part absolutely should not be cut.

Some parts are not connected well. Moving from one part to the next, you notice

that there isn’t a natural ﬂow.

I A new active reading exercise you thought of.

Bibliographic Notes

Although we do our best to cite relevant results, we don’t want to

disrupt the ﬂow of the material by digging into exactly where each concept came from.

There will be complete sections of bibliographic notes in the ﬁnal version of this book,

but they won’t come until after the course has ﬁnished.

Contents

Preface ii

Contents iii

1 Motivation: Why You Might Care 1

1.1 Simpson’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Applications of Causal Inference . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Correlation Does Not Imply Causation . . . . . . . . . . . . . . . . . . . . 3

1.3.1 Nicolas Cage and Pool Drownings . . . . . . . . . . . . . . . . . . . 3

1.3.2 Why is Association Not Causation? . . . . . . . . . . . . . . . . . . 4

1.4 Main Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Potential Outcomes 6

2.1 Potential Outcomes and Individual Treatment Eﬀects . . . . . . . . . . . . 6

2.2 The Fundamental Problem of Causal Inference . . . . . . . . . . . . . . . . 7

2.3 Getting Around the Fundamental Problem . . . . . . . . . . . . . . . . . . 8

2.3.1 Average Treatment Eﬀects and Missing Data Interpretation . . . . 8

2.3.2 Ignorability and Exchangeability . . . . . . . . . . . . . . . . . . . 9

2.3.3 Conditional Exchangeability and Unconfoundedness . . . . . . . . 10

2.3.4 Positivity/Overlap and Extrapolation . . . . . . . . . . . . . . . . . 12

2.3.5 No interference, Consistency, and SUTVA . . . . . . . . . . . . . . 13

2.3.6 Tying It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Fancy Statistics Terminology Defanciﬁed . . . . . . . . . . . . . . . . . . . 15

2.5 A Complete Example with Estimation . . . . . . . . . . . . . . . . . . . . . 16

3 The Flow of Association and Causation in Graphs 19

3.1 Graph Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Causal Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.4 Two-Node Graphs and Graphical Building Blocks . . . . . . . . . . . . . . 23

3.5 Chains and Forks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.6 Colliders and their Descendants . . . . . . . . . . . . . . . . . . . . . . . . 26

3.7 d-separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.8 Flow of Association and Causation . . . . . . . . . . . . . . . . . . . . . . 30

4 Causal Models 32

4.1 The do-operator and Interventional Distributions . . . . . . . . . . . . . . 32

4.2 The Main Assumption: Modularity . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Truncated Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3.1 Example Application and Revisiting “Association is Not Causation” 36

4.4 The Backdoor Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4.1 Relation to Potential Outcomes . . . . . . . . . . . . . . . . . . . . . 39

4.5 Structural Causal Models (SCMs) . . . . . . . . . . . . . . . . . . . . . . . 40

4.5.1 Structural Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.5.2 Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.5.3

Collider Bias and Why to Not Condition on Descendants of Treatment

4.6 Example Applications of the Backdoor Adjustment . . . . . . . . . . . . . 44

4.6.1 Association vs. Causation in a Toy Example . . . . . . . . . . . . . 44

4.6.2 A Complete Example with Estimation . . . . . . . . . . . . . . . . 45

4.7 Assumptions Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Randomized Experiments 49

5.1 Comparability and Covariate Balance . . . . . . . . . . . . . . . . . . . . . 49

5.2 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3 No Backdoor Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6 Nonparametric Identiﬁcation 52

6.1 Frontdoor Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.2 do-calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.2.1 Application: Frontdoor Adjustment . . . . . . . . . . . . . . . . . . 57

6.3 Determining Identiﬁability from the Graph . . . . . . . . . . . . . . . . . . 58

7 Estimation 62

7.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.2 Conditional Outcome Modeling (COM) . . . . . . . . . . . . . . . . . . . . 63

7.3 Grouped Conditional Outcome Modeling (GCOM) . . . . . . . . . . . . . 64

7.4 Increasing Data Eﬃciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.4.1 TARNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.4.2 X-Learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.5 Propensity Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.6 Inverse Probability Weighting (IPW) . . . . . . . . . . . . . . . . . . . . . . 68

7.7 Doubly Robust Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.8 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.9.1 Conﬁdence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.9.2 Comparison to Randomized Experiments . . . . . . . . . . . . . . 72

8 Unobserved Confounding: Bounds and Sensitivity Analysis 73

8.1 Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

8.1.1 No-Assumptions Bound . . . . . . . . . . . . . . . . . . . . . . . . 74

8.1.2 Monotone Treatment Response . . . . . . . . . . . . . . . . . . . . 76

8.1.3 Monotone Treatment Selection . . . . . . . . . . . . . . . . . . . . . 78

8.1.4 Optimal Treatment Selection . . . . . . . . . . . . . . . . . . . . . . 79

8.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

8.2.1 Sensitivity Basics in Linear Setting . . . . . . . . . . . . . . . . . . . 82

8.2.2 More General Settings . . . . . . . . . . . . . . . . . . . . . . . . . 85

9 Instrumental Variables 86

9.1 What is an Instrument? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

9.2 No Nonparametric Identiﬁcation of the ATE . . . . . . . . . . . . . . . . . 87

9.3 Warm-Up: Binary Linear Setting . . . . . . . . . . . . . . . . . . . . . . . . 87

9.4 Continuous Linear Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

9.5 Nonparametric Identiﬁcation of Local ATE . . . . . . . . . . . . . . . . . . 90

9.5.1 New Potential Notation with Instruments . . . . . . . . . . . . . . 90

9.5.2 Principal Stratiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . 90

9.5.3 Local ATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

9.6 More General Settings for ATE Identiﬁcation . . . . . . . . . . . . . . . . . 94

10 Diﬀerence in Diﬀerences 95

10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

10.2 Introducing Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

10.3 Identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

10.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

10.3.2 Main Result and Proof . . . . . . . . . . . . . . . . . . . . . . . . . 97

10.4 Major Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

11 Causal Discovery from Observational Data 100

11.1 Independence-Based Causal Discovery . . . . . . . . . . . . . . . . . . . . 100

11.1.1 Assumptions and Theorem . . . . . . . . . . . . . . . . . . . . . . . 100

11.1.2 The PC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

11.1.3 Can We Get Any Better Identiﬁcation? . . . . . . . . . . . . . . . . 104

11.2 Semi-Parametric Causal Discovery . . . . . . . . . . . . . . . . . . . . . . . 104

11.2.1 No Identiﬁability Without Parametric Assumptions . . . . . . . . . 105

11.2.2 Linear Non-Gaussian Noise . . . . . . . . . . . . . . . . . . . . . . 105

11.2.3 Nonlinear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

11.3 Further Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

12 Causal Discovery from Interventional Data 110

12.1 Structural Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

12.1.1 Single-Node Interventions . . . . . . . . . . . . . . . . . . . . . . . 110

12.1.2 Multi-Node Interventions . . . . . . . . . . . . . . . . . . . . . . . 110

12.2 Parametric Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

12.2.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

12.3 Interventional Markov Equivalence . . . . . . . . . . . . . . . . . . . . . . 110

12.3.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

12.4 Miscellaneous Other Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 110

12.4.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

13 Transfer Learning and Transportability 111

13.1 Causal Insights for Transfer Learning . . . . . . . . . . . . . . . . . . . . . 111

13.1.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

13.2 Transportability of Causal Eﬀects Across Populations . . . . . . . . . . . . 111

13.2.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

14 Counterfactuals and Mediation 112

14.1 Counterfactuals Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

14.1.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

14.2 Important Application: Mediation . . . . . . . . . . . . . . . . . . . . . . . 112

14.2.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Appendix 113

A Proofs 114

A.1 Proof of Equation 6.1 from Section 6.1 . . . . . . . . . . . . . . . . . . . . . 114

A.2 Proof of Propensity Score Theorem (7.1) . . . . . . . . . . . . . . . . . . . . 114

A.3 Proof of IPW Estimand (7.18) . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Bibliography 117

Alphabetical Index 123

剩余131页未读，继续阅读

努力+努力=幸运

粉丝: 2
资源: 138

会员权益专享

【干货书】《因果推理导论-机器学习角度》，132页pdf

评论0

会员权益专享

最新资源

【干货书】《因果推理导论-机器学习角度》，132页pdf

评论0

因果推理发展综述《The Development of Causal Reasoning》

Counterfactuals and Causal Inference (Second Edition)

重新思考因果推断与中国经济学的发展.pdf

机器学习 mit课程

张一鸣近10年微博整理出的231条干货.pdf

python训练模型、如何得到模型训练总时长_【绝对干货】机器学习模型训练全流程！...

httpsblog.csdn.netnbk2014articledetails114993573

代码随想录知识星球精华|最强八股文pdf第四版 免费

python随机数据增强_深度学习中常用的图像数据增强方法-纯干货

visualc开发实战1200例第二卷pdf

干货预警——原来基因功能富集分析这么简单

c语音编程步骤详细干货书写考研小白学习

c语言容斥原理,超级干货-容斥原理大集合！！！

数据结构与算法 哪一本书好？

python爬虫干货

ecmascript6入门 (阮一峰著)pdf

c语音编程步骤详细干货

帮我推荐一个vue3干货技术最好的网站，要求简单容易入门

axure 抖音部件库_原型技巧：如何用Axure画出抖音APP页面色彩风格（干货技能）...

json接口文档模板_API设计指南-「干货」一个接口文档模板的最佳实践

会员权益专享

最新资源

代码随想录知识星球精华|最强八股文pdf第四版免费

数据结构与算法哪一本书好？