trepreneurs has been realized to a far greater degree in the past ten years, owing in part to the
development of powerful, well-documented frameworks.
Testing the potential of deep learning presents unique challenges because any single application
brings together various disciplines. Applying deep learning requires simultaneously understand-
ing (i) the motivations for casting a problem in a particular way; (ii) the mathematical form of a
given model; (iii) the optimization algorithms for tting the models to data; (iv) the basic statisti-
cal principles and intuitions that help us to extract generalizable insights from data; and (v) the
engineering required to train models eciently, navigating the pitfalls of numerical computing
and getting the most out of available hardware. Teaching both the critical thinking skills required
to formulate problems, the mathematics to solve them, and the soware tools to implement those
solutions all in one place presents formidable challenges. Our goal in this book is to present a
unied resource to bring would-be practitioners up to speed.
When we started this book project, there were no resources that simultaneously (i) were up to
date; (ii) covered the full breadth of modern machine learning with substantial technical depth;
and (iii) interleaved exposition of the quality one expects from an engaging textbook with the
clean runnable code that one expects to nd in hands-on tutorials. We found plenty of code exam-
ples for how to use a given deep learning framework (e.g., how to do basic numerical computing
with matrices in TensorFlow) or for implementing particular techniques (e.g., code snippets for
LeNet, AlexNet, ResNets, etc) scattered across various blog posts and GitHub repositories. How-
ever, these examples typically focused on how to implement a given approach, but le out the
discussion of why certain algorithmic decisions are made. While some interactive resources have
popped up sporadically to address a particular topic, e.g., the engaging blog posts published on the
website Distill
1
, or personal blogs, they only covered selected topics in deep learning, and oen
lacked associated code. On the other hand, while several deep learning textbooks have emerged—
e.g., (Goodfellow et al., 2016), which oers a comprehensive survey of the concepts behind deep
learning—these resources do not marry the descriptions to realizations of the concepts in code,
sometimes leaving readers clueless as to how to implement them. Moreover, too many resources
are hidden behind the paywalls of commercial course providers.
We set out to create a resource that could (i) be freely available for everyone; (ii) oer sucient
technical depth to provide a starting point on the path to actually becoming an applied machine
learning scientist; (iii) include runnable code, showing readers how to solve problems in practice;
(iv) allow for rapid updates, both by us and also by the community at large; and (v) be comple-
mented by a forum
2
for interactive discussion of technical details and to answer questions.
These goals were oen in conict. Equations, theorems, and citations are best managed and laid
out in LaTeX. Code is best described in Python. And webpages are native in HTML and JavaScript.
Furthermore, we want the content to be accessible both as executable code, as a physical book, as
a downloadable PDF, and on the Internet as a website. At present there exist no tools and no work-
ow perfectly suited to these demands, so we had to assemble our own. We describe our approach
in detail in Section 19.6. We settled on GitHub to share the source and to facilitate community con-
tributions, Jupyter notebooks for mixing code, equations and text, Sphinx as a rendering engine
to generate multiple outputs, and Discourse for the forum. While our system is not yet perfect,
these choices provide a good compromise among the competing concerns. We believe that this
might be the rst book published using such an integrated workow.
1
http://distill.pub
2
http://discuss.d2l.ai
2 Contents