没有合适的资源?快使用搜索试试~ 我知道了~
首页deep learning 英文版(Bengio)
资源详情
资源评论
资源推荐

Deep Learning
Ian Goodfellow
Yoshua Bengio
Aaron Courville

Contents
Website viii
Acknowledgments ix
Notation xii
1 Introduction 1
1.1 Who Should Read This Book? ...... ......... ..... 8
1.2 Historical Trends in Deep Learning ......... ........ 12
I Applied Math and Machine Learning Basics 27
2 Linear Algebra 29
2.1 Scalars, Vectors, Matrices and Tensors ......... ...... 29
2.2 Multiplying Matrices and Vectors ......... ........ . 32
2.3 Identity and Inverse Matrices ......... ........ ... 34
2.4 Linear Dependence and Span ......... ........ ... 35
2.5 Norms ......... ........ ........ ........ 37
2.6 Special Kinds of Matrices and Vectors ............... 38
2.7 Eigendecomposition .......... ........ ........ 40
2.8 Singular Value Decomposition ........ ........ .... 42
2.9 The Moore-Penrose Pseudoinverse ......... ........ . 43
2.10 The Trace Operator ......... ........ ........ 44
2.11 The Determinant .. ........ ........ ......... 45
2.12 Example: Principal Components Analysis ......... .... 45
3 Probability and Information Theory 51
3.1 Why Probability? ..... ......... ........ ..... 52
i

CONTENTS
3.2 Random Variables ..... ........ ......... .... 54
3.3 Probability Distributions ......... ........ ...... 54
3.4 Marginal Probability ......... ......... ....... 56
3.5 Conditional Probability .. ........ ........ ..... 57
3.6 The Chain Rule of Conditional Probabilities ......... ... 57
3.7 Independence and Conditional Independence ......... ... 58
3.8 Expectation, Variance and Covariance .......... ..... 58
3.9 Common Probability Distributions ............... .. 60
3.10 Useful Properties of Common Functions ... ......... .. 65
3.11 Bayes’ Rule .......... ........ ........ .... 68
3.12 Technical Details of Continuous Variables ...... ....... 69
3.13 Information Theory .......... ........ ........ 71
3.14 Structured Probabilistic Models .... ........ ....... 73
4 Numerical Computation 78
4.1 Overflow and Underflow ......... ........ ...... 78
4.2 Poor Conditioning ......... ........ ......... 80
4.3 Gradient-Based Optimization ....... ........ ..... 80
4.4 Constrained Optimization ............. ........ . 91
4.5 Example: Linear Least Squares ....... ......... ... 94
5 Machine Learning Basics 96
5.1 Learning Algorithms ........... ........ ...... 97
5.2 Capacity, Overfitting and Underfitting .. ........ ..... 108
5.3 Hyperparameters and Validation Sets . ........ ....... 118
5.4 Estimators, Bias and Variance ...... ........ ...... 120
5.5 Maximum Likelihood Estimation ...... ......... ... 129
5.6 Bayesian Statistics ........... ........ ....... 133
5.7 Supervised Learning Algorithms ... ........ ........ 137
5.8 Unsupervised Learning Algorithms ............... .. 142
5.9 Stochastic Gradient Descent .... ......... ........ 149
5.10 Building a Machine Learning Algorithm ............. . 151
5.11 Challenges Motivating Deep Learning ..... ......... .. 152
II Deep Networks: Modern Practices 162
6 Deep Feedforward Networks 164
6.1 Example: Learning XOR . ......... ........ ..... 167
6.2 Gradient-Based Learning . ........ ........ ...... 172
ii

CONTENTS
6.3 Hidden Units ...... ........ ......... ...... 187
6.4 Architecture Design ......... ........ ........ . 193
6.5 Back-Propagation and Other Differentiation
Algorithms .... ......... ........ ......... . 200
6.6 Historical Notes ....... ........ ......... .... 220
7 Regularization for Deep Learning 224
7.1 Parameter Norm Penalties ..... ......... ........ 226
7.2 Norm Penalties as Constrained Optimization ........ .... 233
7.3 Regularization and Under-Constrained Problems .. ....... 235
7.4 Dataset Augmentation .......... ......... ..... 236
7.5 Noise Robustness ......... ........ ........ .. 238
7.6 Semi-Supervised Learning ................ ...... 240
7.7 Multitask Learning ............ ........ ...... 241
7.8 Early Stopping ......... ........ ........ ... 241
7.9 Parameter Tying and Parameter Sharing.............. 249
7.10 Sparse Representations ......... ........ ....... 251
7.11 Bagging and Other Ensemble Methods . ......... ..... 253
7.12 Dropout ........ ......... ........ ....... 255
7.13 Adversarial Training ........ ......... ........ 265
7.14 Tangent Distance, Tangent Prop and Manifold
Tangent Classifier ............ ......... ...... 267
8 Optimization for Training Deep Models 271
8.1 How Learning Differs from Pure Optimization ........... 272
8.2 Challenges in Neural Network Optimization ..... ....... 279
8.3 Basic Algorithms .......... ........ ......... 290
8.4 Parameter Initialization Strategies . ......... ....... 296
8.5 Algorithms with Adaptive Learning Rates ....... ...... 302
8.6 Approximate Second-Order Methods . ........ ....... 307
8.7 Optimization Strategies and Meta-Algorithms ..... ...... 313
9 Convolutional Networks 326
9.1 The Convolution Operation ................ ..... 327
9.2 Motivation .. ........ ......... ........ .... 329
9.3 Pooling ............. ........ ......... ... 335
9.4 Convolution and Pooling as an Infinitely Strong Prior .. ..... 339
9.5 Variants of the Basic Convolution Function ............ 342
9.6 Structured Outputs . ........ ......... ........ 352
9.7 Data Types ...... ........ ........ ........ 354
iii

CONTENTS
9.8 Efficient Convolution Algorithms ........ ........ .. 356
9.9 Random or Unsupervised Features ........ ........ . 356
9.10 The Neuroscientific Basis for Convolutional
Networks ....... ......... ........ ........ 358
9.11 Convolutional Networks and the History of Deep Learning .... 365
10Sequence Modeling: Recurrent and Recursive Nets 367
10.1 Unfolding Computational Graphs ............. ..... 369
10.2 Recurrent Neural Networks ... ......... ........ . 372
10.3 Bidirectional RNNs.............. ......... ... 388
10.4 Encoder-Decoder Sequence-to-Sequence
Architectures ... ........ ......... ........ . 390
10.5 Deep Recurrent Networks ........ ......... ..... 392
10.6 Recursive Neural Networks ..... ......... ........ 394
10.7 The Challenge of Long-Term Dependencies .......... ... 396
10.8 Echo State Networks .......... ......... ...... 399
10.9 Leaky Units and Other Strategies for Multiple
Time Scales ..... ........ ......... ........ 402
10.10The Long Short-Term Memory and Other Gated RNNs .. .... 404
10.11Optimization for Long-Term Dependencies ........ ..... 408
10.12Explicit Memory .......... ......... ........ 412
11Practical Methodology 416
11.1 Performance Metrics .......... ........ ....... 417
11.2 Default Baseline Models ........ ........ ....... 420
11.3 Determining Whether to Gather More Data .......... .. 421
11.4 Selecting Hyperparameters ......... ........ ..... 422
11.5 Debugging Strategies ..... ........ ......... ... 431
11.6 Example: Multi-Digit Number Recognition ..... ........ 435
12Applications 438
12.1 Large-Scale Deep Learning . ........ ........ ..... 438
12.2 Computer Vision ......... ........ ........ .. 447
12.3 Speech Recognition...... ........ ......... ... 453
12.4 Natural Language Processing ... ........ ........ . 456
12.5 Other Applications ......... ........ ........ . 473
iv
剩余731页未读,继续阅读

















随心追梦ing
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助

会员权益专享
安全验证
文档复制为VIP权益,开通VIP直接复制

评论3