深度学习入门：从基础知识到实践应用

需积分: 9 168 浏览量更新于2024-07-15 收藏 25.06MB PDF 举报

"Dive into Deep Learning - 深度学习详解" 本书是深度学习领域的经典教程，由Aston Zhang、Zachary C. Lipton、Mu Li和Alexander J. Smola等人共同编著，旨在深入浅出地介绍深度学习的基础知识和应用。最新版本为Release 0.15.1，更新日期为2020年11月6日。在"Introduction"章节中，作者首先通过一个吸引人的例子（1.1 AMotivatingExample）引入深度学习的概念，接着阐述了深度学习的关键组成部分（1.2 KeyComponents），包括神经网络、反向传播等。此外，他们还讨论了机器学习问题的类型（1.3 KindsofMachineLearningProblems），比如监督学习、无监督学习等，并探讨了深度学习的起源（1.4 Roots）及其发展之路（1.5 TheRoadtoDeepLearning）。同时，书中列举了深度学习在各个领域的成功案例（1.6 SuccessStories），并概括了深度学习的特性（1.7 Characteristics）。在"Preliminaries"部分，作者从基础数据操作开始（2.1 DataManipulation），包括如何开始使用数据、进行基本操作、广播机制、索引与切片以及内存优化（2.1.1-2.1.6）。数据预处理也是关键，包括加载数据集（2.2.1 ReadingtheDataset）、处理缺失值（2.2.2 HandlingMissingData）以及转换为张量格式（2.2.3 ConversiontotheTensorFormat）。在数学基础部分，书中介绍了线性代数（2.3 LinearAlgebra），涵盖标量、向量、矩阵、张量的基本概念（2.3.1-2.3.4），以及它们在运算中的基本性质（2.3.5），如求和（2.3.6）、点积（2.3.7）、矩阵-向量乘法（2.3.8）和矩阵乘法（2.3.9）。此外，还有范数的讨论（2.3.10）和更多线性代数的内容（2.3.11）。最后，书中还涉及微积分（2.4 Calculus），这是理解梯度下降和反向传播等深度学习核心算法的基础。这本书深入浅出地涵盖了深度学习所需的基本数学和编程知识，对于初学者和进阶者都是极好的学习资源。通过阅读此书，读者可以逐步掌握深度学习的理论框架，为实际应用打下坚实基础。

18.7 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865

18.7.1 The Maximum Likelihood Principle . . . . . . . . . . . . . . . . . . . . . 866

18.7.2 Numerical Optimization and the Negative Log-Likelihood . . . . . . . . . 867

18.7.3 Maximum Likelihood for Continuous Variables . . . . . . . . . . . . . . . 869

18.8 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871

18.8.1 Bernoulli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871

18.8.2 Discrete Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873

18.8.3 Continuous Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874

18.8.4 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876

18.8.5 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878

18.8.6 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881

18.8.7 Exponential Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884

18.9 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885

18.9.1 Optical Character Recognition . . . . . . . . . . . . . . . . . . . . . . . . 886

18.9.2 The Probabilistic Model for Classication . . . . . . . . . . . . . . . . . . 887

18.9.3 The Naive Bayes Classier . . . . . . . . . . . . . . . . . . . . . . . . . . 887

18.9.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888

18.10 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892

18.10.1 Evaluating and Comparing Estimators . . . . . . . . . . . . . . . . . . . . 892

18.10.2 Conducting Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . 896

18.10.3 Constructing Condence Intervals . . . . . . . . . . . . . . . . . . . . . . 900

18.11 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903

18.11.1 Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903

18.11.2 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905

18.11.3 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907

18.11.4 Kullback–Leibler Divergence . . . . . . . . . . . . . . . . . . . . . . . . . 911

18.11.5 Cross Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913

19 Appendix: Tools for Deep Learning 917

19.1 Using Jupyter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917

19.1.1 Editing and Running the Code Locally . . . . . . . . . . . . . . . . . . . . 917

19.1.2 Advanced Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921

19.2 Using Amazon SageMaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922

19.2.1 Registering and Logging In . . . . . . . . . . . . . . . . . . . . . . . . . . 922

19.2.2 Creating a SageMaker Instance . . . . . . . . . . . . . . . . . . . . . . . . 923

19.2.3 Running and Stopping an Instance . . . . . . . . . . . . . . . . . . . . . . 924

19.2.4 Updating Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925

19.3 Using AWS EC2 Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926

19.3.1 Creating and Running an EC2 Instance . . . . . . . . . . . . . . . . . . . . 926

19.3.2 Installing CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931

19.3.3 Installing MXNet and Downloading the D2L Notebooks . . . . . . . . . . . 932

19.3.4 Running Jupyter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933

19.3.5 Closing Unused Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . 934

19.4 Using Google Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934

19.5 Selecting Servers and GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935

19.5.1 Selecting Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936

19.5.2 Selecting GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937

19.6 Contributing to This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 940

19.6.1 Minor Text Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 940

19.6.2 Propose a Major Change . . . . . . . . . . . . . . . . . . . . . . . . . . . 940

19.6.3 Adding a New Section or a New Framework Implementation . . . . . . . . 941

xiv

Preface

Just a few years ago, there were no legions of deep learning scientists developing intelligent prod-

ucts and services at major companies and startups. When the youngest among us (the authors)

entered the eld, machine learning did not command headlines in daily newspapers. Our parents

had no idea what machine learning was, let alone why we might prefer it to a career in medicine or

law. Machine learning was a forward-looking academic discipline with a narrow set of real-world

applications. And those applications, e.g., speech recognition and computer vision, required so

much domain knowledge that they were oen regarded as separate areas entirely for which ma-

chine learning was one small component. Neural networks then, the antecedents of the deep

learning models that we focus on in this book, were regarded as outmoded tools.

In just the past ve years, deep learning has taken the world by surprise, driving rapid progress

in elds as diverse as computer vision, natural language processing, automatic speech recogni-

tion, reinforcement learning, and statistical modeling. With these advances in hand, we can now

build cars that drive themselves with more autonomy than ever before (and less autonomy than

some companies might have you believe), smart reply systems that automatically dra the most

mundane emails, helping people dig out from oppressively large inboxes, and soware agents that

dominate the worldʼs best humans at board games like Go, a feat once thought to be decades away.

Already, these tools exert ever-wider impacts on industry and society, changing the way movies

are made, diseases are diagnosed, and playing a growing role in basic sciences—from astrophysics

to biology.

About This Book

This book represents our attempt to make deep learning approachable, teaching you the concepts,

the context, and the code.

One Medium Combining Code, Math, and HTML

For any computing technology to reach its full impact, it must be well-understood, well-

documented, and supported by mature, well-maintained tools. The key ideas should be clearly

distilled, minimizing the onboarding time needing to bring new practitioners up to date. Mature

libraries should automate common tasks, and exemplar code should make it easy for practitioners

to modify, apply, and extend common applications to suit their needs. Take dynamic web appli-

cations as an example. Despite a large number of companies, like Amazon, developing successful

database-driven web applications in the 1990s, the potential of this technology to aid creative en-

trepreneurs has been realized to a far greater degree in the past ten years, owing in part to the

development of powerful, well-documented frameworks.

Testing the potential of deep learning presents unique challenges because any single application

brings together various disciplines. Applying deep learning requires simultaneously understand-

ing (i) the motivations for casting a problem in a particular way; (ii) the mathematics of a given

modeling approach; (iii) the optimization algorithms for tting the models to data; and (iv) the

engineering required to train models eciently, navigating the pitfalls of numerical computing

and getting the most out of available hardware. Teaching both the critical thinking skills required

to formulate problems, the mathematics to solve them, and the soware tools to implement those

solutions all in one place presents formidable challenges. Our goal in this book is to present a

unied resource to bring would-be practitioners up to speed.

At the time we started this book project, there were no resources that simultaneously (i) were

up to date; (ii) covered the full breadth of modern machine learning with substantial technical

depth; and (iii) interleaved exposition of the quality one expects from an engaging textbook with

the clean runnable code that one expects to nd in hands-on tutorials. We found plenty of code

examples for how to use a given deep learning framework (e.g., how to do basic numerical com-

puting with matrices in TensorFlow) or for implementing particular techniques (e.g., code snip-

pets for LeNet, AlexNet, ResNets, etc) scattered across various blog posts and GitHub repositories.

However, these examples typically focused on how to implement a given approach, but le out the

discussion of why certain algorithmic decisions are made. While some interactive resources have

popped up sporadically to address a particular topic, e.g., the engaging blog posts published on

the website Distill

, or personal blogs, they only covered selected topics in deep learning, and

oen lacked associated code. On the other hand, while several textbooks have emerged, most no-

tably (Goodfellow et al., 2016), which oers a comprehensive survey of the concepts behind deep

learning, these resources do not marry the descriptions to realizations of the concepts in code,

sometimes leaving readers clueless as to how to implement them. Moreover, too many resources

are hidden behind the paywalls of commercial course providers.

We set out to create a resource that could (i) be freely available for everyone; (ii) oer sucient

technical depth to provide a starting point on the path to actually becoming an applied machine

learning scientist; (iii) include runnable code, showing readers how to solve problems in practice;

(iv) allow for rapid updates, both by us and also by the community at large; and (v) be comple-

mented by a forum

for interactive discussion of technical details and to answer questions.

These goals were oen in conict. Equations, theorems, and citations are best managed and laid

out in LaTeX. Code is best described in Python. And webpages are native in HTML and JavaScript.

Furthermore, we want the content to be accessible both as executable code, as a physical book,

as a downloadable PDF, and on the Internet as a website. At present there exist no tools and no

workow perfectly suited to these demands, so we had to assemble our own. We describe our

approach in detail in Section 19.6. We settled on GitHub to share the source and to allow for edits,

Jupyter notebooks for mixing code, equations and text, Sphinx as a rendering engine to generate

multiple outputs, and Discourse for the forum. While our system is not yet perfect, these choices

provide a good compromise among the competing concerns. We believe that this might be the

rst book published using such an integrated workow.

http://distill.pub

http://discuss.d2l.ai

2 Contents

剩余996页未读，继续阅读

qzxcn

粉丝: 3
资源: 1

深度学习入门：从基础知识到实践应用

d2l-pytorch：该项目复制了Dive Into Deep Learning（Dive Into Deep Learning）（www.d2l.ai）一书，将MXNet中的代码改编为PyTorch

Dive into Deep Learning中文版1

Dive into Deep Learning (D2L Book)-python

动手学深度学习（Dive into Deep Learning，D2L

Dive into deep learning task 05- 卷积神经网络基础；leNet；卷积神经网络进阶

d2l-java：Dive into Deep Learning的Java实现（D2L.ai）

《动手深度学习》｜《Dive Into Deep Learning》课程学习笔记，基于TensorFlow 2.0 框架实现

Dive into deep learning task 04-机器翻译及相关技术；注意力机制与Seq2seq模型；Transformer

Dive into Deep Learning A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola

Dive into deep learning task 03- 过拟合、欠拟合及其解决方案；梯度消失、梯度爆炸；循环神经网络进阶

最新资源