深度学习Python实战：手把手教你从入门到精通

需积分: 10 31 浏览量更新于2024-07-20 1 收藏 5.85MB PDF 举报

"《Deep Learning with Python：A Hands-on Introduction》是尼基尔·凯特卡尔（Nikhil Ketkar）撰写的一本书，涵盖了深度学习的基础到进阶知识，包括了机器学习基础、前馈神经网络、Theano的介绍、卷积神经网络、循环神经网络、Keras的入门、随机梯度下降以及自动微分等内容。该书还涉及到GPU在深度学习中的应用，并提供了相关的源代码和补充材料。" 这本书深入浅出地引导读者进入深度学习的世界。在第一章“Introduction to Deep Learning”中，作者可能介绍了深度学习的基本概念，包括神经网络的架构、深度学习与传统机器学习的区别，以及深度学习在图像识别、自然语言处理等领域的重要应用。第二章“Machine Learning Fundamentals”探讨了机器学习的基础，可能涵盖了监督学习、无监督学习、数据预处理、模型评估等关键概念，这是理解深度学习的基础。第三章“Feed Forward Neural Networks”详细讲述了前馈神经网络（FFNN），包括其结构、训练过程和反向传播算法，这些都是构建深度学习模型的基础。第四章“Introduction to Theano”则介绍了Theano这一深度学习框架，可能讨论了如何用Theano定义和优化计算图，以及其在多维数组处理和数学运算上的优势。第五章“Convolutional Neural Networks (CNN)”专注于卷积神经网络，讲解了CNN在图像识别中的作用，如特征提取、池化操作以及卷积层和全连接层的工作原理。第六章“Recurrent Neural Networks (RNN)”探讨了循环神经网络，重点在于它们在序列数据处理上的能力，如LSTM和GRU单元，以及它们在自然语言处理任务中的应用。第七章“Introduction to Keras”介绍了Keras这一高级深度学习库，强调了其易用性、模块化设计以及与TensorFlow等底层库的集成。第八章“Stochastic Gradient Descent (SGD)”讲述了随机梯度下降法，这是深度学习中常用的一种优化算法，用于更新网络权重以最小化损失函数。第九章“Automatic Differentiation”深入了自动微分的概念，它是深度学习中计算梯度的关键，使得模型能够进行端到端的训练。第十章“Introduction to GPUs”讨论了如何利用GPU加速深度学习计算，解释了GPU并行计算的优势和如何配置环境以利用GPU资源。这本书是一本面向实践的深度学习指南，通过实际案例和代码示例，帮助读者掌握深度学习的核心技术和工具，特别是Keras框架的使用。对于希望入门或提升深度学习技能的人来说，这是一个非常有价值的资源。

CHAPTER 2 ■ MACHINE LEARNING FUNDAMENTALS

# Output

Train RMSE (Degree = 1) 3.50756834691

Test RMSE (Degree = 1) 7.69514326946

Train RMSE (Degree = 2) 0.91896252959

Test RMSE (Degree = 2) 0.446173435392

Train RMSE (Degree = 8) 0.897346255079

Test RMSE (Degree = 8) 14.1908525449

Figure 2-1. Generate a toy problem dataset for regression

In order to simulate seen and unseen data, we use the first 80 data points as seen data and the rest

we treat as unseen data. That is, we build the model using only the first 80 data points and use the rest for

evaluating the model.

Next, we use a very simple algorithm to generate a model, commonly referred to as Least Squares.

Given a data set of the form

Dxyxyxy

()()

()

{}

11 22

,, ,

where



and

Î 

, the least squares model

takes the form y =

x where

is a vector such that

is minimized. Here X is a matrix where each row

is an x, thus



with m being the number of examples (in our case 80). The value of

can be derived

using the closed form

()

. We are glossing over a lot of important details of the least squares

method but those are secondary to the current discussion. The more pertinent detail is how we transform

the input variable to a suitable form. In our first model, we will transform x to be a vector of values [x

, x

,x

That is, if x = 2, it will be transformed to [1, 2, 4]. Post this transformation, we can generate a least squares

model

using the formula described above. What is happening under the hood is that we are approximating

the given data with a second order polynomial (degree = 2) equation, and the least squares algorithm is

simply curve fitting or generating the coefficients for each of [x

, x

,x

].

CHAPTER 2 ■ MACHINE LEARNING FUNDAMENTALS

We now have all the details in place to discuss the core concept of generalization. The key question to ask

is which is the better model? The one with degree = 2 or the one with degree = 8 or the one with degree = 1?

Let us start by making a few observations about the three models. The model with degree = 1 performs

poorly on both the seen as well as unseen data as compared to the other two models. The model with

degree = 8 performs better on seen data as compared to the model with degree = 2. The model with

degree= 2 performs better than the model with degree = 8 on unseen data. Table2-1 visualizes this in table

form for easy interpretation.

Table 2-1. Comparing the performance of the 3 different models

Degree 1 2 8

Seen Data Worst Worst Better

Unseen Data Worst Better Worst

Let us now understand the important concept of model capacity, which corresponds to the degree of the

polynomial in this example. The data we generated was using a second order polynomial (degree = 2) with

some noise. Then, we tried to approximate the data using three models of degree: 1, 2, and 8, respectively.

The higher the degree, the more expressive is the model. That is, it can accommodate more variation. This

ability to accommodate variation corresponds to the notion of capacity. That is, we say that the model with

degree = 8 has a higher capacity that the model with degree = 2, which in turn has a higher capacity than

the model with degree = 1. Isn’t having higher capacity always a good thing? It turns out it is not, when we

consider that all real world datasets contain some noise and a higher capacity model will end up just fitting

the noise in addition to the signal in the data. This is why we observe that the model with degree = 2 does

better on the unseen data as compared to the model with degree = 8. In this example, we knew how the data

was generated (with a second order polynomial (degree = 2) with some noise); hence, this observation is quite

trivial. However, in the real world, we don’t know the underlying mechanism by which the data is generated.

This leads us to the fundamental challenge in machine learning, which is, does the model truly generalize?

And the only true test for that is the performance over unseen data.

In a sense the concept of capacity corresponds to the simplicity or parsimony of the model. A model

with high capacity can approximate more complex data. This is how many free variables/coefficients the

model has. In our example, the model with degree = 1 does not have capacity sufficient to approximate the

data and this is commonly referred to as under fitting. Correspondingly, the model with degree = 8 has extra

capacity and it over fits the data.

As a thought experiment, consider what would happen if we had a model with degree equal to 80. Given

that we had 80 data points as training data, we would have an 80-degree polynomial that would perfectly

approximate the data. This is the ultimate pathological case wherein there is no learning at all. The model

has 80 coefficients and can simply memorize the data. This is referred to as rote learning, the logical extreme

of overfitting. This is why the capacity of the model needs to be tuned with respect to the amount of training

data we have. If the dataset is small, we are better off training models with lower capacity.

Regularization

Building on the idea of model capacity, generalization, over fitting, and under fitting, let us now cover the

idea of regularization. The key idea here is to penalize complexity of the model. A regularized version of least

squares takes the form y =

x, where

is a vector such that Xy

is minimized and λ is a

user-defined parameter that controls the complexity. Here, by introducing the term

, we are penalizing

complex models. To see why this is the case, consider fitting a least square model using a polynomial of

degree 10, but the values in the vector

has 8 zeros; 2 are non-zeros. Against this, consider the case where

剩余163页未读，继续阅读

吴_楚

粉丝: 1
资源: 5

深度学习Python实战：手把手教你从入门到精通

深度学习MATLAB工具包-DeepLearnToolbox-master.zip

a iintroductio to ns2

Reinforcement learning an introduction中文pdf

A Friendly Introductio to XMPP

introduction-for-objective-c:Bootcamp 在修复单元测试时习惯 Objective-C 语法

An Introductio n to Software Architecture.pdf

linear programming: a concise introduction

Flex-Introduction.ppt

学生信息管理系统-----------无数据库版本

2024年福建省村级（居委会）行政区划shp数据集

最新资源