Python深度学习实战

需积分: 9 27 浏览量更新于2024-07-18 收藏 14.04MB PDF 举报

"Deep Learning with Python" 是一本专注于深度学习的书籍，通过Python语言进行讲解，适合有一定Python基础但对机器学习和深度学习不熟悉的读者。书中内容涵盖了从基础理论到高级实践应用的全过程。深度学习是人工智能领域的一个重要分支，它模仿人脑神经网络的结构和功能，构建出复杂的多层模型，用于识别图像、语音、自然语言处理等多种任务。Python作为目前最流行的编程语言之一，因其丰富的库支持和简洁的语法，成为了深度学习研究和开发的首选工具。在本书中，读者将逐步了解深度学习的基础概念，如神经网络的架构、反向传播算法以及损失函数等。这些理论知识是理解深度学习工作原理的关键。接着，作者会引导读者使用Python中的库，如TensorFlow、Keras或PyTorch，来搭建和训练自己的第一个深度学习模型。这部分内容包括卷积神经网络（CNN）在图像识别中的应用，循环神经网络（RNN）及其变体如LSTM在序列数据处理上的运用，以及如何进行模型的优化和调参。随着阅读的深入，读者将接触到更高级的主题，如生成对抗网络（GANs）用于创新性内容的生成，强化学习（RL）用于智能决策系统，以及如何将模型部署到生产环境中。此外，书中的案例和实战项目将帮助读者将所学应用于解决实际问题，如文本分类、推荐系统、自动驾驶等。本书还强调了理解和解释模型的重要性，因为深度学习模型的黑箱特性常常引发对其可靠性和可解释性的担忧。因此，书中可能会包含对模型可视化、特征重要性分析以及模型验证的技术和方法。 "Deep Learning with Python" 是一个全面的深度学习教程，无论你是初学者还是有经验的开发者，都能从中受益。通过学习，你可以掌握深度学习的核心概念，并具备使用Python解决实际问题的能力。同时，书中提供的论坛链接（https://forums.manning.com/forums/deep-learning-with-python）可以让你与作者和其他读者交流，共同探讨深度学习的奥秘。

Deep learning has reached a level of public attention and industry investment never seen

before in the history of AI, but it isn’t the first successful form of machine learning. In

fact, it’s a safe bet to say that most of the machine learning algorithms in use in the

industry today are still not deep learning algorithms. Deep learning isn’t always the right

tool for the job—sometimes there just isn’t enough data for deep learning to be

applicable, and sometimes the problem is simply better solved by a different algorithm. If

deep learning is your first contact with machine learning, then you may find yourself in a

situation where all you have is the deep learning hammer and every machine learning

problem starts looking like a nail for this hammer. The only way not to fall into this trap

is to be familiar with other approaches and practice them when appropriate.

A detailed exposure of classical machine learning approaches is outside of the scope

of this book, but we will briefly go over them and describe the historical context in which

they were developed. This will allow us to place deep learning in the broader context of

machine learning, and better understand where deep learning comes from and why it

matters.

Probabilistic modeling is the application of the principles of statistics to data analysis. It

was one of the earliest forms of machine learning, yet it is still widely used to this day.

One of the best-known algorithms in this category is the Naive Bayes algorithm.

Naive Bayes is a type of machine learning classifier based on applying the Bayes

Theorem while assuming that the features in the input data are all independent (a strong,

or "naive" assumption, which is where the name comes from). This form of data analysis

actually predates computers, and was applied by hand decades before its first computer

implementation (most likely dating back to the 1950s). The Bayes Theorem and the

foundations of statistics themselves date back to the 18th century, and these are all you

need to start using Naive Bayes classifiers.

A closely related model is the Logistic Regression (logreg for short), which is

sometimes considered to be the "hello world" of modern machine learning. Don’t be

misled by its name—logreg is in fact a classification algorithm rather than a regression

algorithm. Much like Naive Bayes, logreg predates computing by a long time, yet it is

still very useful to this day, thanks to its simple and versatile nature. It is often the first

thing a data scientist will try on a dataset to get a feel for the classification task at hand.

1.2 Before deep learning: a brief history of machine learning

1.2.1 Probabilistic modeling

©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and

other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.

https://forums.manning.com/forums/deep-learning-with-python

Licensed to <null>

Early iterations of neural networks have been completely supplanted by the modern

variants that we cover in these pages; however, it is helpful to be aware of how deep

learning originated. Although the core ideas of neural networks were investigated in toy

forms as early as the 1950s, the approach took decades to really get started. For a long

time, the missing piece was a lack of an efficient way to train large neural networks. This

changed in the mid-1980s, as multiple people independently rediscovered the

"backpropagation" algorithm, a way to train chains of parametric operations using

gradient descent optimization (later in the book, we will go on to precisely define these

concepts), and started applying it to neural networks.

The first successful practical application of neural nets came in 1989 from Bell Labs,

when Yann LeCun combined together the earlier ideas of convolutional neural networks

and backpropagation, and applied them to the problem of handwritten digits

classification. The resulting network, dubbed "LeNet", was used by the US Post Office in

the 1990s to automate the reading of ZIP codes on mail envelopes.

As neural networks started gaining some respect among researchers in the 1990s thanks

to this first success, a new approach to machine learning rose to fame and quickly sent

neural nets back to oblivion: kernel methods.

Kernel methods are a group of classification algorithms, the best known of which is

the Support Vector Machine (SVM). The modern formulation of SVM was developed by

Vapnik and Cortes in the early 1990s at Bell Labs and published in 1995, although an

older linear formulation was published by Vapnik and Chervonenkis as early as 1963.

SVM aims at solving classification problems by finding good "decision boundaries"

(Figure 1.10) between two sets of points belonging to two different categories. A

"decision boundary" can be thought of as a line or surface separating your training data

into two spaces corresponding to two categories. To classify new data points, you just

need to check which side of the decision boundary they fall on.

Figure 1.10 A decision boundary

1.2.2 Early neural networks

1.2.3 Kernel methods

©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and

other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.

https://forums.manning.com/forums/deep-learning-with-python

Licensed to <null>

SVMs proceed to find these boundaries in two steps:

First, the data is mapped to a new high-dimensional representation where the decision

boundary can be expressed as an hyperplane (if the data is two-dimensional like in our

example, an "hyperplane" would simply be a straight line).

Then a good decision boundary (a separation hyperplane) is computed by trying to

maximize the distance between the hyperplane and the closest data points from each

class, a step called "maximizing the margin". This allows the boundary to generalize well

to new samples outside of the training dataset.

The technique of mapping data to a high-dimensional representation where a

classification problem becomes simpler may look good on paper, but in practice it is

often computationally intractable. That’s where the "kernel trick" comes in, the key idea

that kernel methods are named after. Here’s the gist of it: for finding good decision

hyperplanes in the new representation space, you don’t have to explicitly compute the

coordinates of your points in the new space, you just need to compute the distance

between pairs of points in that space, which can be done very efficiently using what is

called a "kernel function". A kernel function is a computationally tractable operation that

maps any two points in your initial space to the distance between these points in your

target representation space, completely by-passing the explicit computation of the new

representation. Kernel functions are typically crafted by hand rather than learned from

data—in the case of SVM, only the separation hyperplane is learned.

At the time they were developed, SVMs exhibited state of the art performance on

simple classification problems, and were one of the few machine learning methods

backed by extensive theory and amenable to serious mathematical analysis, making it

well-understood and easily interpretable. Because of these useful properties, it became

extremely popular in the field for a long time.

However, SVM proved hard to scale to large datasets and did not provide very good

results for "perceptual" problems such as image classification. Since SVM is a "shallow"

method, applying SVM to perceptual problems requires first extracting useful

representations manually (a step called "feature engineering"), which is difficult and

brittle.

Decision trees are flowchart-like structures that can allow to classify input data points or

predict output values given inputs. They are easy to visualize and interpret. Decisions

trees learned from data started getting significant research interest in the 2000s, and by

2010 they were often preferred to kernel methods.

1.2.4 Decision trees, Random Forests and Gradient Boosting Machines

©Manning Publications Co. We welcome reader comments about anything in the manuscript - other than typos and

other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.

https://forums.manning.com/forums/deep-learning-with-python

Licensed to <null>

Figure 1.11 A decision tree: the parameters that are learned are the questions about the

data. A question could be, for instance, "is coefficient 2 in the data higher than 3.5?".

In particular, the "Random Forest" algorithm introduced a robust and practical take on

decision tree learning that involves building a large number of specialized decision trees

then ensembling their outputs. Random Forests are applicable to a very wide range of

problems --you could say that they are almost always the second-best algorithm for any

shallow machine learning task. When the popular machine learning competition website

Kaggle.com got started in 2010, Random Forests quickly became a favorite on the

platform—until 2014, when Gradient Boosting Machines took over. Gradient Boosting

Machines, much like Random Forests, is a machine learning technique based on

ensembling weak prediction models, generally decision trees. It leverages "gradient

boosting", a way to improve any machine learning model by iteratively training new

models that specialize in addressing the weak points of the previous models. Applied to

decision trees, the use of the "gradient boosting" technique results in models that strictly

outperform Random Forests most of the time, while having very similar properties. It

may be one of the best, if not the best, algorithm for dealing with non-perceptual data

today. Alongside deep learning, it is one of the most commonly used technique in Kaggle

competitions.

Around 2010, while neural networks were almost completely shunned by the scientific

community at large, a number of people still working on neural networks started making

important breakthroughs: the groups of Geoffrey Hinton at the University of Toronto,

Yoshua Bengio at the University of Montreal, Yann LeCun at New York University, and

IDSIA in Switzerland.

In 2011, Dan Ciresan from IDSIA started winning academic image classification

competitions with GPU-trained deep neural networks—the first practical success of

modern deep learning. But the watershed moment came in 2012, with the entry of

Hinton’s group in the yearly large-scale image classification challenge ImageNet. The

ImageNet challenge was notoriously difficult at the time, consisting in classifying

high-resolution color images into 1000 different categories after training on 1.4 million

images. In 2011, the top-5 accuracy of the winning model, based on classical approaches

1.2.5 Back to neural networks

other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.

https://forums.manning.com/forums/deep-learning-with-python

Licensed to <null>

to computer vision, was only 74.3%. Then in 2012, a team led by Alex Krizhevsky and

advised by Geoffrey Hinton was able to achieve a top-5 accuracy of 83.6%—a significant

breakthrough. The competition has been dominated by deep convolutional neural

networks every year since. By 2015, we had reached an accuracy of 96.4%, and the

classification task on ImageNet was considered to be a completely solved problem.

Since 2012, deep convolutional neural networks ("convnets") have become the go-to

algorithm for all computer vision tasks, and generally all perceptual tasks. At major

computer vision conferences in 2015 or 2016, it had become nearly impossible to find

presentations that did not involve convnets in some form. At the same time, deep learning

has also found applications in many other types of problems, such as natural language

processing. It has come to completely replace SVMs and decision trees in a wide range of

applications. For instance, for several years, the European Organization for Nuclear

Research, CERN, used decision tree-based methods for analysis of particle data from the

ATLAS detector at the Large Hadron Collider (LHC), but they eventually switched to

Keras-based deep neural networks due to their higher performance and ease of training

on large datasets.

The reason why deep learning took off so quickly is primarily that it offered better

performance on many problems. But that’s not the only reason. Deep learning is also

making problem-solving much easier, because it completely automates what used to be

the most crucial step in a machine learning workflow: "feature engineering".

Previous machine learning techniques, "shallow" learning, only involved

transforming the input data into one or two successive representation spaces, usually via

very simple transformations such as high-dimensional non-linear projections (SVM) or

decision trees. But the refined representations required by complex problems generally

cannot be attained by such techniques. As such, humans had to go to great length to make

the initial input data more amenable to processing by these methods, i.e. they had to

manually engineer good layers of representations for their data. This is what is called

"feature engineering". Deep learning, on the other hand, completely automates this step:

with deep learning, you all features in one pass rather than having to engineer themlearn

yourself. This has greatly simplified machine learning workflows, often replacing very

sophisticated multi-stage pipelines with a single, simple, end-to-end deep learning model.

You may ask, if the crux of the issue is to have multiple successive layers of

representation, could shallow methods be applied repeatedly to emulate the effects of

deep learning? In practice, there are fast-diminishing returns to successive application of

shallow learning methods, because the optimal first representation layer in a 3-layer

. What is transformativemodel is not the optimal first layer in a 1-layer or 2-layer model

about deep learning is that it allows a model to learn all layers of representation , atjointly

the same time, rather than in succession ("greedily", as it is called). With joint feature

learning, whenever the model adjusts one of its internal features, all other features that

depend on it will automatically adapt to the change, without requiring human

1.2.6 What makes deep learning different

other simple mistakes. These will be cleaned up during production of the book by copyeditors and proofreaders.

https://forums.manning.com/forums/deep-learning-with-python

Licensed to <null>

剩余316页未读，继续阅读

身份认证购VIP最低享 7 折!

30元优惠券

weixin_42115271

粉丝: 0

Python深度学习实战

Deep_Learning_with_Python_2018.pdf

deep_learning_with_python_Jason_Brownlee.zip

[machine_learning_mastery系列]deep_learning_with_python.pdf(with code)

Deep_Learning_with_Python_Keras_francois

Deep_Learning_with_Python_Keras PDF高清版

Deep_Learning_With_Python.pdf

deep_learning_with_python By Jason Brownlee

Deep_Learning_with_Python_Keras(深度学习与python)（英文版）

Deep_Learning_with_Python_Keras（优秀英文原版教材）.pdf

《deep_learning_with_python》代码.zip

最新资源