Python深度学习实战：Keras入门指南

需积分: 0 27 浏览量更新于2024-07-18 收藏 6.47MB PDF 举报

《深度学习与Python 2017》是由Keras创始人、Google AI研究员François Chollet撰写的一本深入浅出的书籍，旨在介绍使用Python语言和强大的Keras库进行深度学习的方法。本书通过直观的解释和实际案例，帮助读者逐步建立深度学习的基础，并探索其中的复杂概念，包括计算机视觉、自然语言处理和生成模型等应用领域。书中内容涵盖了从入门到实践的全面教程，读者将通过作者精心设计的实践活动，掌握如何在自己的项目中实际应用深度学习技术。尼希尔·凯塔克（Nikhil Ketkar）作为作者，这本书不仅提供理论知识，还强调了实践经验的重要性，确保读者在阅读过程中不仅能理解理论，还能通过动手操作加深理解。本书适合对深度学习感兴趣并希望用Python作为工具的初学者，同时也适合有一定基础的开发者进一步提升技能。书中提供的实例有助于读者熟悉Keras库的使用，了解其在各种任务中的高效应用，例如图像分类、文本分析或生成对抗网络（GANs）。此外，版权信息显示，本书受法律保护，任何复制、传播或修改都必须得到作者的许可。通过阅读《深度学习与Python 2017》，读者可以收获以下关键知识点： 1. **Python编程基础**：了解如何在Python环境中设置深度学习开发环境，以及如何编写和调试代码。 2. **Keras库介绍**：学习Keras API的结构和功能，包括层、模型构建、训练和评估方法。 3. **深度学习原理**：掌握神经网络的基本概念，如前馈神经网络、卷积神经网络（CNN）、循环神经网络（RNN）等。 4. **实战应用**：通过计算机视觉任务（如图像分类、物体检测）和自然语言处理（如文本分类、情感分析）的实例，理解深度学习在实际问题中的应用。 5. **模型优化与调参**：学习如何调整超参数、使用正则化和批量归一化等技术提高模型性能。 6. **生成模型**：接触生成对抗网络（GANs）和其他形式的深度生成模型，理解它们的工作原理和潜在应用。 7. **深度学习实践项目**：通过完成书中的练习和示例，提升实际操作和解决问题的能力。《深度学习与Python 2017》是一本既适合初学者快速入门，也适合经验丰富的开发人员进阶的实用指南，帮助读者踏上深度学习之旅，并在实际项目中发挥所学。

Chapter 1 ■ IntroduCtIon to deep LearnIng

Overview of Subsequent Chapters

We now provide an overall outline of the subsequent chapters for the reader. It is important to note that

each of the chapters covers either the concepts or the skills (or in certain cases both) with respect to deep

learning. We highlight these below so that the readers can ensure that they have internalized these concepts

and skills. It is highly recommended that the readers not only read the chapters but also work out the

mathematical details (using pen and paper) and play with the source code provided in each of the chapters.

1. Chapter 2 covers the basics of Machine Learning. The key take home point for

this chapter is the concept of generalizing over unseen examples, the ideas of

over-fitting and under-fitting the training data, the capacity of the model, and the

notion of regularization.

2. Chapter 3 covers Feed Forward Neural Networks and serves as the conceptual

foundation for the entire book. Concepts like the overall structure of the neural

network, the input, hidden and output layers, cost functions and their basis on

the principle of Maximum Likelihood are the important concepts in this chapter.

3. Chapter 4 provides a hands-on introduction to the Theano library. It covers how

to define networks as computational graphs, automatically derive gradients for

complicated networks, and train neural networks.

4. Chapter 5 covers Convolutional Neural Networks, which are perhaps the most

successful application of deep learning.

5. Chapter 6 covers Recurrent Neural Networks and Long Short Term Memory

(LSTM) networks, which are another successful application of deep learning.

6. Chapter 7 provides a hands-on introduction to the Keras library. The Keras

library provides a number of high-level abstractions over the Theano library

and is probably the ideal go-to tool when it comes to building deep learning

applications.

7. Chapter 8 introduces the reader to Stochastic Gradient Descent (SGD), which is

the most common procedure used to train Neural Networks. This chapter also

covers the shortcomings of SGD and a number of variations to SGD that address

these shortcomings.

8. Chapter 9 introduces the reader to Automatic Differentiation (commonly

referred to as backpropagation), which is a standard technique used to derive

gradients (required for SGD) for arbitrarily complicated networks.

9. Chapter 10 introduces the reader to Graphical Processing Units (GPUs) and

GPU-based computation, which has acted as a key enabling technology for deep

learning.

Installing the Required Libraries

There are a number of libraries that the reader will need to install in order to run the source code for

the examples in the chapters. We recommend that the reader install Anaconda Python Distribution

(https://www.continuum.io/downloads), which would make the process of installing the required packages

significantly easy (using either conda or pip). The list of packages the reader would need includes Scikit-learn,

Theano, Autograd, Keras, and PyOpenCL.

N. Ketkar, Deep Learning with Python, DOI 10.1007/978-1-4842-2766-4_2

CHAPTER 2

Machine Learning Fundamentals

Deep Learning is a branch of Machine Learning and in this chapter we will cover the fundamentals of

Machine Learning. While machine learning as a subject is inherently mathematical in nature, we will keep

mathematics to the basic minimum required to develop intuition about the subject. Prerequisites for the

subject matter covered in this chapter would be linear algebra, multivariable calculus, and basic probability

theory.

Intuition

As human beings we are intuitively aware of the concept of learning: it simply means to get better at a task

over a period of time. The task could be physical (like learning to drive a car) or intellectual (like learning

a new language). The subject of machine learning focuses on development of algorithms that can learn as

humans do; that is, they get better at a task over a period over time, with experience.

The first question to ask is why we would be interested in development of algorithms that improve

their performance over time, with experience. After all, there are many algorithms that are developed and

implemented to solve real world problems that don’t improve over time, they simply are developed by

humans and implemented in software and they get the job done. From banking to e-commerce and from

navigation systems in our cars to landing a spacecraft on the moon, algorithms are everywhere, and, a

majority of them do not improve over time. These algorithms simply perform the task they are intended to

perform, with some maintenance required from time to time. Why do we need machine learning?

The answer to this question is that for certain tasks it is easier to develop an algorithm that learns/

improves its performance with experience than to develop an algorithm manually. While this might seem

unintuitive to the reader at this point, we will build intuition for this during the course of this chapter.

Binary Classification

In order to further discuss the matter at hand, we need to be precise about some of the terms we have been

intuitively using, like task, learning, experience, and improvement. We will start with the task of binary

classification.

Consider an abstract problem domain where we have data of the form

D = {(x

, y

), (x

, y

), … (x

, y

)}

where x



and y = ±1. We do not have access to all such data but only a subset

SDÎ

. Using S, our task

is to generate a computational procedure that implements the function

fx y:

such that we can use f to

make predictions over unseen data

xy S

()

Ï that are correct, fx y

()

= . Let us denote

UDÎ

as the set of

CHAPTER 2 ■ MACHINE LEARNING FUNDAMENTALS

unseen data, that is,

xy S

()

Ï and xy U

()

Î . We measure performance over this task as the error over

unseen data,

EfDU

fx y

xy U

()

We now have a precise definition of the task, which is to categorize data into one of two categories

(y = ±1) based on some seen data S by generating f. We measure performance (and improvement in

performance) using the error E (f,D,U) over unseen data U. The size of the seen data |S| is the conceptual

equivalent of experience. In this context, we want to develop algorithms that generate such functions

f (which are commonly referred to as a model). In general, the field of machine learning studies the

development of such algorithms that produce models that make predictions over unseen data for such

algorithms, and other formal tasks (we will be introducing multiple such tasks later in the chapter). Note that

the x is commonly referred to as the input/input variable and y is referred to as the output/output variable.

As with any other discipline in computer science, the computational characteristics of such algorithms

is an important facet, but in addition to that we also would like to have a model f that achieves a lower error

E (f, D, U) with as small a |S| as possible.

Let us now relate this abstract but precise definition to a real world problem so that our abstractions

are grounded. Let us say an e-commerce web site wants to customize the landing page for registered users

to show them the products the users might be interested in buying. The web site has historical data on users

and would like to implement this as a feature so as to increase sales. Let us now see how this real world

problem maps onto the abstract problem of binary classification we described earlier.

The first thing that one might notice is that given a particular user and a particular product, one wants

to predict whether the user will buy the product. Since this is the value to be predicted, it maps onto y = ±1,

where we will let the value of y = +1 denote the prediction that the user will buy the product and we will

denote y = –1 as the prediction that the user does not buy the product. Note that that there is no particular

reason for picking these values; we could have swapped this (let y = +1 denote the does not buy case and

y = –1 denote the buy case) and there would be no difference. We just use y = ±1 to denote the two classes of

interest to categorize data. Next, let us assume that we can somehow represent the attributes of the product

and the user’s buying and browsing history as



. This step is referred to as feature engineering in

machine learning and we will cover it later in the chapter. For now, it suffices to say that we are able to

generate such a mapping. Thus, we have historical data of what the users browsed and bought, attributes of

a product and whether the user bought the product or not mapped onto {(x

,y

),(x

, y

), … (x

, y

)}. Now,

based on this data, we would like to generate a function or a model

fx y:

which we can use to determine

which products a particular user will buy and use this to populate the landing page for users. We can

measure how well the model is doing on unseen data by populating the landing page for users and see

whether they buy the products or not and evaluate the error E (f,D,U).

Regression

Let us introduce another task, namely regression. Here, we have data of the form Dxyxyxy

()()

()

{}

11 22

,, ,

where



and y

Î 

and our task is to generate a computational procedure that implements the function

fx y:

. Note that instead of the prediction being a binary class label y = ±1 like in binary classification, we have

real valued prediction. We measure performance over this task as the root mean squared error (RMSE) over

unseen data,

CHAPTER 2 ■ MACHINE LEARNING FUNDAMENTALS

model = numpy.dot(numpy.dot(numpy.linalg.inv(numpy.dot(X_train.transpose(),X_train)),X_

train.transpose()),y_train)

pylab.plot(x,y,'g')

pylab.xlabel("x")

pylab.ylabel("y")

predicted = numpy.dot(model, [numpy.power(x,i) for i in xrange(0,degree)])

pylab.plot(x, predicted,'r')

pylab.legend(["Actual", "Predicted"], loc = 2)

train_rmse1 = numpy.sqrt(numpy.sum(numpy.dot(y[0:80] - predicted[0:80], y_train -

predicted[0:80])))

test_rmse1 = numpy.sqrt(numpy.sum(numpy.dot(y[80:] - predicted[80:], y[80:] -

predicted[80:])))

print("Train RMSE (Degree = 1)", train_rmse1)

print("Test RMSE (Degree = 1)", test_rmse1)

# Model with degree 2

pylab.figure()

degree = 3

X_train = numpy.column_stack([numpy.power(x_train,i) for i in xrange(0,degree)])

model = numpy.dot(numpy.dot(numpy.linalg.inv(numpy.dot(X_train.transpose(),X_train)),

X_train.transpose()),y_train)

pylab.plot(x,y,'g')

pylab.xlabel("x")

pylab.ylabel("y")

predicted = numpy.dot(model, [numpy.power(x,i) for i in xrange(0,degree)])

pylab.plot(x, predicted,'r')

pylab.legend(["Actual", "Predicted"], loc = 2)

train_rmse1 = numpy.sqrt(numpy.sum(numpy.dot(y[0:80] - predicted[0:80],

y_train - predicted[0:80])))

test_rmse1 = numpy.sqrt(numpy.sum(numpy.dot(y[80:] - predicted[80:],

y[80:] - predicted[80:])))

print("Train RMSE (Degree = 2)", train_rmse1)

print("Test RMSE (Degree = 2)", test_rmse1)

# Model with degree 8

pylab.figure()

degree = 9

X_train = numpy.column_stack([numpy.power(x_train,i) for i in xrange(0,degree)])

model = numpy.dot(numpy.dot(numpy.linalg.inv(numpy.dot(X_train.transpose(),X_train)),

X_train.transpose()), y_train)

pylab.plot(x, y,'g')

pylab.xlabel("x")

pylab.ylabel("y")

predicted = numpy.dot(model, [numpy.power(x,i) for i in xrange(0,degree)])

pylab.plot(x, predicted,'r')

pylab.legend(["Actual", "Predicted"], loc = 3)

train_rmse2 = numpy.sqrt(numpy.sum(numpy.dot(y[0:80] - predicted[0:80],

y_train - predicted[0:80])))

test_rmse2 = numpy.sqrt(numpy.sum(numpy.dot(y[80:] - predicted[80:],

y[80:] - predicted[80:])))

print("Train RMSE (Degree = 8)", train_rmse2)

print("Test RMSE (Degree = 8)", test_rmse2)

dou.bz/35ODPo

剩余168页未读，继续阅读

JN_rainbow

粉丝: 18

Python深度学习实战：Keras入门指南

深度学习新书《Deep Learning with Python 2017》（文字版非扫描版）

深度学习新书Deep learning with python 2017

Deep Learning with Python 2017_w.pdf

Francois Chollet - Deep Learning with Python - 2017

deep learning with python_deeplearning_python3_wasqzi_deeplearni

Deep Learning with Python

基于遗传算法的动态优化物流配送中心选址问题研究（Matlab源码+详细注释）,遗传算法与免疫算法在物流配送中心选址问题的应用详解（源码+详细注释，Matlab编写，含动态优化与迭代，结果图展示）,遗传

SpringBoot博客项目.zip(毕设&课设&实训&大作业&竞赛&项目)

基于改进蚁群算法与动态窗口法的多机器人路径规划与避障算法研究：去除冗余点、实现全局与局部实时动态规划,基于改进蚁群算法与动态窗口法的多机器人路径规划与避障算法研究：去除冗余点，实现全局与局部实时动态规

C语言epoll的实例服务端用法

最新资源