Keras实战指南：深度学习入门与实践

需积分: 9 23 浏览量更新于2024-07-18 收藏 6.85MB PDF 举报

《深度学习的第一接触：Keras实践指南》是一本由 Jordi Torres 编著的实用教程，旨在帮助读者直观地理解和入门深度学习。这本书作为"巴塞罗那 WATCH THIS SPACE"系列的一部分，于2018年7月首次出版，以 Kindle Direct Publishing 的 epub 格式呈现，具有清晰的排版和高质量的图片，包括背景图由 Roser Bellido 提供，而插图则出自 BSC-CNS 使用深度学习风格转换算法的作品。作者 Jordi Torres 是 Universitat Politècnica de Catalunya (UPC) 巴塞罗那科技学院的一名成员，该书详细介绍了如何使用 Keras 这个流行的深度学习框架进行实践操作。Keras 是一个高级神经网络 API，它允许用户快速构建和试验复杂的模型，对于深度学习专业人士和初学者来说都非常友好。书中通过图文并茂的方式，引导读者从基础概念开始，逐步探索深度学习的核心原理，如卷积神经网络（CNN）、循环神经网络（RNN）和深度强化学习等。《深度学习的第一接触》不仅适合想要深入理解这个领域的专业人士，也对那些对人工智能感兴趣但没有深厚计算机科学背景的人士开放。作者强调，这本书遵循 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (CC BY-NC-SA 3.0)，这意味着读者可以在个人学习或非营利教学环境中复制、重发、混搭和修改内容，但必须保留原始版权信息，并且不得用于商业用途。这是一本实用的深度学习入门教材，提供了丰富的实例和实践经验，是任何希望在深度学习领域迈出第一步的学习者不可或缺的参考资料。阅读此书，读者将能够掌握基本的深度学习工具和技术，为进一步的研究和项目开发打下坚实的基础。

 
Supercomputing, the heart of Deep Learning
 
Surely, at this point, some readers have already posed the question: why has a researcher in supercomputing such as me, started to investigate Deep Learning?
In fact, many years ago I started to be interested in how supercomputing could contribute to improving Machine Learning methods; Then, in 2006, I started co-directing PhD theses with a great
friend, and professor at the Computer Science department of the UPC, Ricard Gavaldà
[8]
 , an expert in Machine Learning and Data Mining.
But it  was not  until  September 2013,  when  I already  had  a relatively  solid  base of  knowledge  about Machine  Learning,  that I  started to  focus  my interest  on  Deep Learning.  Thanks to the
researcher from our Computer Architecture Department at UPC Jordi Nin, I discovered the article Building High-level Features Using Large Scale Unsupervised Learning
[9]
, written by Google
researchers. In this article presented at the previous International Conference in Machine Learning (ICML'12), the authors explained how they trained a Deep Learning model in a cluster of 1,000
machines with  16,000  cores.  I  was  very happy to see how supercomputing made it possible to accelerate this type of applications, as I wrote in my blog
[10]
 a few months later, justifying the
reasons that led the group to add this research focus to our research roadmap.
Thanks to  Moore's Law
[11]
, in  2012,  when these Google researchers wrote this  article, we had supercomputers that allowed  us  to solve problems that would have  been  intractable a few years
before due to the computing capacity. For example, the  computer  that  I  had  access  to in  1982,  where  I  executed my  first  program  with  punch-cards,  it  was a  Fujitsu  that  made  it  possible to
execute a little more than one million operations per second. 30 years later, in 2012, the Marenostrum supercomputer that we had at the time at the Barcelona Supercomputing Center-National
Supercomputing Center (BSC), was only 1,000,000,000 times faster than the computer on which I started.
 
 
With the upgrade of that year, the MareNostrum supercomputer offered a theoretical maximum performance peak of 1.1 Petaflops (1,100,000,000,000 floating point operations per second
[12]
). It
achieved it with 3,056 servers with a total of 48,896 cores and 115,000 Gigabytes of total main memory housed in 36 racks. At that time the Marenostrum supercomputer was considered to be
one of the fastest in the world. It was placed in the thirty-sixth position, in the TOP500 list
[13]
, which is updated every half year and ranks the 500 most powerful supercomputers in the world.
Attached you can find a photography where you can see the Marenostrum computer racks that were housed in the Torres Girona chapel of the UPC campus in Barcelona.
[14]
.
 
The first GPU in the ImageNet competition
During that period was when I began to become aware of the applicability of supercomputing to this new area of research. When I started looking for research articles on the subject, I discovered
the existence  of  the  Imagenet  competition and the results of the team of the University of  Toronto  in  the  competition  in 2012
[15]
. The ImageNet competition (Large Scale Visual Recognition
Challenge
[16]
) had been held since 2010, and by that time it had become a benchmark in the computer vision community for the recognition of objects on a large scale. In 2012 Alex Krizhevsky,
Ilya Sutskever and Geoffrey E. Hilton used for the first time hardware accelerators GPU (graphical processing units)
 [17]
, which was already used at that time in supercomputing centers like ours
in Barcelona to increase the speed of execution of applications that require the  performance of many calculations.
For example, at that time BSC already had a supercomputer called MinoTauro, with 128 Bull505 nodes, equipped with 2 Intel processors and 2 Tesla M2090 GPUs from NVIDIA each one. With
a peak performance of 186 Teraflops, launched in September 2011 (out of curiosity, at that time it was considered the most energy efficient supercomputer in Europe according to the Green500
list
[18]
). 
Until 2012, the increase in computing capacity that we got each year from computers was as a result of the improvement of the CPU. However, since then the increase in computing capacity for
Deep Learning has not only been credited to them, but also to the new massively parallel systems based on GPU accelerators, which are many times more efficient than traditional CPUs.
GPUs  were  originally  developed  to  accelerate  the  3D  game  that  requires  the  repeated  use  of  mathematical  processes  that  include  different  matrix  calculations.  Initially,  companies  such  as
NVIDIA and AMD developed these fast and massively parallel chips for graphics cards dedicated to video games. However, it soon became clear that the use of GPUs for 3D games was also
very suitable for accelerating calculations on numerical matrices; therefore, this hardware actually benefited the scientific community, and in 2007 NVIDIA launched the CUDA
[19]
 programming

language to program its GPUs. As a result, supercomputing research centers such as the BSC began using GPU clusters to accelerate numerical applications.

But as we will see later in this book, artificial neural networks basically perform matrix operations that are also highly parallelizable. And this is what Alex Krizhevsky's team did in 2012: he

trained his Deep Learning algorithm "AlexNet" with GPU. Since then, some research groups have started using GPUs for this competition, and nowadays all the groups that do research in Deep

Learning research field are using this hardware or equivalent alternatives that have appeared recently.

An exponential growth of computing capacity

I have already said that Krizhevsky's team's milestone was an important turning point in the field of Deep Learning, and since then there have been spectacular results, one after another, with an

exponential growth of increasingly surprising results.

But I believe that research in this field has been guided largely by experimental findings rather than by theory, in the sense that these spectacular advances in the area since 2012 have only been

possible thanks to the fact that the computation that was required to be able to carry them out was available; In this way, researchers in this field have been able to test and extend old ideas, while

they have advanced with new ones that required a lot of computing resources.

OpenAI

[20]

has recently published a study on its blog

[21]

that corroborates precisely this vision that I am defending. Specifically, they present an analysis in which it is confirmed that, since 2012,

the amount of computation available to generate models of artificial intelligence has increased exponentially while claiming that improvements in computing capacity have been a key component

of the progress of Artificial Intelligence.

In this same article they present an impressive graph

[22]

to synthesize the results of their analysis:

The graph shows the total amount of calculations, in Petaflop per day, that have been used to train neural networks that have their own name and are referents in the Deep Learning community.

Remember that a "petaflop / s-day", the vertical axis of the graph that is in logarithmic scale, is equivalent to perform 1,000,000,000,000,000 neural network operations per second during a day

(s-day), or a total of approximately 100,000,000,000,000,000 operations, regardless of numerical precision.

Accelerating Deep Learning with parallel systems

The tasks of training Deep Learning networks requires a large amount of computation and, often, they also need the same type of matrix operations as the numerical calculation intensive

applications, which makes them similar to traditional supercomputing applications. Therefore, Deep Learning applications work very well in computer systems that use accelerators such as GPU

or field-programmable gate arrays (FPGA), which have been used in the Supercomputing field for more than a decade within the walls of the supercomputing research centers. These hardware

devices focus on computational performance by specializing their architecture in using high data parallelism in supercomputing workloads. And precisely these techniques can also be used to

accelerate the learning algorithms of Deep Learning.

Therefore, from 2012 until 2014, Deep Learning researchers started using systems with GPUs. The advantage, in addition, is that these learning algorithms escalated perfectly when they could

put more than one GPU in a node. The following graph, extracted from one of our research articles, shows how increasing the number of GPUs can accelerate the learning process

[23]

Learning, requiring the knowledge of experts in supercomputing to accelerate these algorithms.

Will specialized hardware for deep learning be a game changer?

As of 2016, in addition to all the previous innovations in supercomputing, processing chips began to appear that were specially designed for Deep Learning algorithms. For example, in 2016

Google announced that it had built a dedicated processor called the Tensor Processing Unit (TPU)

[26]

. Since then Google has already developed 3 versions of TPU, the last one presented in its

[27]

conference, where they claimed that it is 8 times more powerful than the previous version. In addition, now not only the architecture is specific to train neural networks, but also for the

inference stage (use of the previously trained model).

In the following graph obtained from the Google Cloud blog

[28]

, we can see a comparison of predictions per second obtained (on a logarithmic scale) for the three different types of architecture

mentioned above.

The acceleration of Deep Learning with specialized hardware has only just begun, both for the training stage and the inference stage, if we take into account that numerous companies are

appearing that are designing and starting to produce specific chips for Artificial Intelligence

[29]

Specialized hardware is sure going to be a big factor in Deep Learning race. We will see great progress soon, I am sure. However, even more interesting will be to see what role Deep Learning

plays in changing hardware in the near future.

Tapping the Next Generation of Supercomputers

And now we are at the point of convergence of Artificial Intelligence technologies and supercomputing technologies. The result will soon be part of the portfolio that companies providing

computer systems (and Cloud services) will offer to the industrial and business world.

An example of what will be on the market in a short while, is a part of the current Supercomputer Marenostrum of the Barcelona Supercomputing Center (BSC). MareNostrum is the generic

name used by the BSC to refer to the different updates of its most emblematic and the most powerful supercomputer in Spain, and until today four versions have been installed since 2004

[30]

. At

present, the Marenustrum is the most heterogeneous supercomputer in the world, with all kinds of experimental hardware available on the market, since its purpose is to serve as an experimental

platform to design future supercomputers.

This is structured in that the calculation capacity of the current MareNostrum 4 is divided into two parts of totally differentiated hardware: a block of general purpose and a block of emerging

technologies. The emerging technologies block is made up of clusters of three different technologies that will be incorporated and updated as they become available. These are technologies that

are currently being developed in the United States and Japan to accelerate the arrival of the new generation of pre-exascale supercomputers.

One of these technologies is based on an IBM system designed especially for Deep Learning and Artificial Intelligence applications

[31]

; IBM has created all the software stack necessary for it. At

the time of writing this book, the hardware is available and the PowerAI

[32]

software package will soon be installed, which will convert this supercomputing hardware into a machine specially

designed for Artificial Intelligence. Through this software, the main frameworks of Deep Learning such as TensorFlow (and Keras, included in the Tensorflow package), Caffe, Chainer, Torch

and Theano will be available to researchers of Artificial Intelligence.

In terms of hardware, this part of the Marenostrum consists of a 54 node cluster based on IBM POWER9 and NVIDIA V100 with Linux operating system and interconnected by an infiniband

network at 100 Gigabits per second. Each node is equipped with 2 IBM POWER9 processors that have 20 physical cores each and 512GB of memory. Each of these POWER9 processors are

connected to two NVIDIA V100 (Volta) GPUs with 16GB of memory, a total of 4 GPUs per node. See the following picture:

剩余115页未读，继续阅读

ironman10

粉丝: 1
资源: 7

Keras实战指南：深度学习入门与实践

Deep-Learning-with-Keras-master.zip

Deep Learning with Python: A Hands-on Introduction

Introduction-to-Deep-Learning-Neural-Networks-with-Keras-Coursera

deep-learning-with-keras-ja:“直觉深度学习”存储库

Tensorflow-2-and-Keras-Deep-Learning-Bootcamp

Deep-Learning-with-TensorFlow-2-and-Keras:Packt发行的《使用TensorFlow 2和Keras进行深度学习》

Deep-Learning-Fundamentals-with-Keras:edX深度学习课程

Deep-Learning-with-Keras:Packt发行的Keras深度学习代码存储库

deep-learning-keras-tensorflow：使用Keras和Tensorflow的深度神经网络简介

Deep-learning-examples-using-keras

最新资源