没有合适的资源?快使用搜索试试~ 我知道了~
首页【干货书】《因果推理导论-机器学习角度》,132页pdf
【干货书】《因果推理导论-机器学习角度》,132页pdf
需积分: 41 107 下载量 108 浏览量
更新于2023-04-28
评论 8
收藏 979KB PDF 举报
【干货书】《因果推理导论-机器学习角度》,132页pdf 有几个主要的主题贯穿全书。这些主题主要是对两个不同类别的比较。当你阅读的时候,很重要的一点是你要明白书的不同部分适合什么类别,不适合什么类别。 统计与因果。即使有无限多的数据,我们有时也无法计算一些因果量。相比之下,很多统计是关于在有限样本中解决不确定性的。当给定无限数据时,没有不确定性。然而,关联,一个统计概念,不是因果关系。在因果推理方面还有更多的工作要做,即使在开始使用无限数据之后也是如此。这是激发因果推理的主要区别。我们在这一章已经做了这样的区分,并将在整本书中继续做这样的区分。 识别与评估。因果效应的识别是因果推论所独有的。这是一个有待解决的问题,即使我们有无限的数据。然而,因果推理也与传统统计和机器学习共享估计。我们将主要从识别因果效应(在第2章中,4和6)之前估计因果效应(第7章)。例外是2.5节和节4.6.2,我们进行完整的例子估计给你的整个过程是什么样子。 介入与观察。如果我们能进行干预/实验,因果效应的识别就相对容易了。这很简单,因为我们可以采取我们想要衡量因果效应的行动,并简单地衡量我们采取行动后的效果。观测数据变得更加复杂,因为数据中几乎总是引入混杂。 假设。将会有一个很大的焦点是我们用什么假设来得到我们得到的结果。每个假设都有自己的框来帮助人们注意到它。清晰的假设应该使我们很容易看到对给定的因果分析或因果模型的批评。他们希望,清晰地提出假设将导致对因果关系的更清晰的讨论。
资源详情
资源评论
资源推荐
Course Lecture Notes
Introduction to Causal Inference
from a Machine Learning Perspective
Brady Neal
December 17, 2020
Preface
Prerequisites
There is one main prerequisite:
basic probability
. This course assumes
you’ve taken an introduction to probability course or have had equivalent experience.
Topics from statistics and machine learning will pop up in the course from time to
time, so some familiarity with those will be helpful but is not necessary. For example, if
cross-validation is a new concept to you, you can learn it relatively quickly at the point in
the book that it pops up. And we give a primer on some statistics terminology that we’ll
use in Section 2.4.
Active Reading Exercises
Research shows that one of the best techniques to remember
material is to actively try to recall information that you recently learned. You will see
“active reading exercises” throughout the book to help you do this. They’ll be marked by
the Active reading exercise: heading.
Many Figures in This Book
As you will see, there are a ridiculous amount of figures in
this book. This is on purpose. This is to help give you as much visual intuition as possible.
We will sometimes copy the same figures, equations, etc. that you might have seen in
preceding chapters so that we can make sure the figures are always right next to the text
that references them.
Sending Me Feedback
This is a book draft, so I greatly appreciate any feedback you’re
willing to send my way. If you’re unsure whether I’ll be receptive to it or not, don’t be.
Please send any feedback to me at bradyneal11@gmail.com with “[Causal Book]” in the
beginning of your email subject. Feedback can be at the word level, sentence level, section
level, chapter level, etc. Here’s a non-exhaustive list of useful kinds of feedback:
I Typoz.
I Some part is confusing.
I
You notice your mind starts to wander, or you don’t feel motivated to read some
part.
I Some part seems like it can be cut.
I You feel strongly that some part absolutely should not be cut.
I
Some parts are not connected well. Moving from one part to the next, you notice
that there isn’t a natural flow.
I A new active reading exercise you thought of.
Bibliographic Notes
Although we do our best to cite relevant results, we don’t want to
disrupt the flow of the material by digging into exactly where each concept came from.
There will be complete sections of bibliographic notes in the final version of this book,
but they won’t come until after the course has finished.
Contents
Preface ii
Contents iii
1 Motivation: Why You Might Care 1
1.1 Simpson’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Applications of Causal Inference . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Correlation Does Not Imply Causation . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Nicolas Cage and Pool Drownings . . . . . . . . . . . . . . . . . . . 3
1.3.2 Why is Association Not Causation? . . . . . . . . . . . . . . . . . . 4
1.4 Main Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Potential Outcomes 6
2.1 Potential Outcomes and Individual Treatment Effects . . . . . . . . . . . . 6
2.2 The Fundamental Problem of Causal Inference . . . . . . . . . . . . . . . . 7
2.3 Getting Around the Fundamental Problem . . . . . . . . . . . . . . . . . . 8
2.3.1 Average Treatment Effects and Missing Data Interpretation . . . . 8
2.3.2 Ignorability and Exchangeability . . . . . . . . . . . . . . . . . . . 9
2.3.3 Conditional Exchangeability and Unconfoundedness . . . . . . . . 10
2.3.4 Positivity/Overlap and Extrapolation . . . . . . . . . . . . . . . . . 12
2.3.5 No interference, Consistency, and SUTVA . . . . . . . . . . . . . . 13
2.3.6 Tying It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Fancy Statistics Terminology Defancified . . . . . . . . . . . . . . . . . . . 15
2.5 A Complete Example with Estimation . . . . . . . . . . . . . . . . . . . . . 16
3 The Flow of Association and Causation in Graphs 19
3.1 Graph Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Causal Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Two-Node Graphs and Graphical Building Blocks . . . . . . . . . . . . . . 23
3.5 Chains and Forks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Colliders and their Descendants . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 d-separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8 Flow of Association and Causation . . . . . . . . . . . . . . . . . . . . . . 30
4 Causal Models 32
4.1 The do-operator and Interventional Distributions . . . . . . . . . . . . . . 32
4.2 The Main Assumption: Modularity . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Truncated Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 Example Application and Revisiting “Association is Not Causation” 36
4.4 The Backdoor Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.1 Relation to Potential Outcomes . . . . . . . . . . . . . . . . . . . . . 39
4.5 Structural Causal Models (SCMs) . . . . . . . . . . . . . . . . . . . . . . . 40
4.5.1 Structural Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5.2 Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5.3
Collider Bias and Why to Not Condition on Descendants of Treatment
43
4.6 Example Applications of the Backdoor Adjustment . . . . . . . . . . . . . 44
4.6.1 Association vs. Causation in a Toy Example . . . . . . . . . . . . . 44
4.6.2 A Complete Example with Estimation . . . . . . . . . . . . . . . . 45
4.7 Assumptions Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Randomized Experiments 49
5.1 Comparability and Covariate Balance . . . . . . . . . . . . . . . . . . . . . 49
5.2 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 No Backdoor Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Nonparametric Identification 52
6.1 Frontdoor Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 do-calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2.1 Application: Frontdoor Adjustment . . . . . . . . . . . . . . . . . . 57
6.3 Determining Identifiability from the Graph . . . . . . . . . . . . . . . . . . 58
7 Estimation 62
7.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2 Conditional Outcome Modeling (COM) . . . . . . . . . . . . . . . . . . . . 63
7.3 Grouped Conditional Outcome Modeling (GCOM) . . . . . . . . . . . . . 64
7.4 Increasing Data Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.4.1 TARNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.4.2 X-Learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.5 Propensity Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.6 Inverse Probability Weighting (IPW) . . . . . . . . . . . . . . . . . . . . . . 68
7.7 Doubly Robust Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.8 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.9.1 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.9.2 Comparison to Randomized Experiments . . . . . . . . . . . . . . 72
8 Unobserved Confounding: Bounds and Sensitivity Analysis 73
8.1 Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.1.1 No-Assumptions Bound . . . . . . . . . . . . . . . . . . . . . . . . 74
8.1.2 Monotone Treatment Response . . . . . . . . . . . . . . . . . . . . 76
8.1.3 Monotone Treatment Selection . . . . . . . . . . . . . . . . . . . . . 78
8.1.4 Optimal Treatment Selection . . . . . . . . . . . . . . . . . . . . . . 79
8.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.2.1 Sensitivity Basics in Linear Setting . . . . . . . . . . . . . . . . . . . 82
8.2.2 More General Settings . . . . . . . . . . . . . . . . . . . . . . . . . 85
9 Instrumental Variables 86
9.1 What is an Instrument? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.2 No Nonparametric Identification of the ATE . . . . . . . . . . . . . . . . . 87
9.3 Warm-Up: Binary Linear Setting . . . . . . . . . . . . . . . . . . . . . . . . 87
9.4 Continuous Linear Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.5 Nonparametric Identification of Local ATE . . . . . . . . . . . . . . . . . . 90
9.5.1 New Potential Notation with Instruments . . . . . . . . . . . . . . 90
9.5.2 Principal Stratification . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.5.3 Local ATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.6 More General Settings for ATE Identification . . . . . . . . . . . . . . . . . 94
10 Difference in Differences 95
10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.2 Introducing Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3.2 Main Result and Proof . . . . . . . . . . . . . . . . . . . . . . . . . 97
10.4 Major Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11 Causal Discovery from Observational Data 100
11.1 Independence-Based Causal Discovery . . . . . . . . . . . . . . . . . . . . 100
11.1.1 Assumptions and Theorem . . . . . . . . . . . . . . . . . . . . . . . 100
11.1.2 The PC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
11.1.3 Can We Get Any Better Identification? . . . . . . . . . . . . . . . . 104
11.2 Semi-Parametric Causal Discovery . . . . . . . . . . . . . . . . . . . . . . . 104
11.2.1 No Identifiability Without Parametric Assumptions . . . . . . . . . 105
11.2.2 Linear Non-Gaussian Noise . . . . . . . . . . . . . . . . . . . . . . 105
11.2.3 Nonlinear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
11.3 Further Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
12 Causal Discovery from Interventional Data 110
12.1 Structural Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.1.1 Single-Node Interventions . . . . . . . . . . . . . . . . . . . . . . . 110
12.1.2 Multi-Node Interventions . . . . . . . . . . . . . . . . . . . . . . . 110
12.2 Parametric Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.2.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.3 Interventional Markov Equivalence . . . . . . . . . . . . . . . . . . . . . . 110
12.3.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.4 Miscellaneous Other Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.4.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
13 Transfer Learning and Transportability 111
13.1 Causal Insights for Transfer Learning . . . . . . . . . . . . . . . . . . . . . 111
13.1.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
13.2 Transportability of Causal Effects Across Populations . . . . . . . . . . . . 111
13.2.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
14 Counterfactuals and Mediation 112
14.1 Counterfactuals Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.1.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.2 Important Application: Mediation . . . . . . . . . . . . . . . . . . . . . . . 112
14.2.1 Coming Soon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Appendix 113
A Proofs 114
A.1 Proof of Equation 6.1 from Section 6.1 . . . . . . . . . . . . . . . . . . . . . 114
A.2 Proof of Propensity Score Theorem (7.1) . . . . . . . . . . . . . . . . . . . . 114
A.3 Proof of IPW Estimand (7.18) . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Bibliography 117
Alphabetical Index 123
剩余131页未读,继续阅读
努力+努力=幸运
- 粉丝: 2
- 资源: 138
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0