谷歌TensorFlow：大规模分布式机器学习系统

需积分: 1 132 浏览量更新于2024-07-19 收藏 877KB PDF 举报

"tensorflow_whitepaper2015 - TensorFlow：大规模异构分布式系统上的机器学习" 在2015年11月9日发布的TensorFlow初步白皮书中，Google提出了一种用于表达机器学习算法的接口，并实现了一种执行此类算法的方法。这个开源平台的设计目标是能够在从移动设备（如手机和平板）到大规模分布式系统（由数百台机器和数千个GPU组成）的各种异构硬件上，无缝地运行计算任务。 1. **TensorFlow概述**： TensorFlow是一个灵活的计算框架，它允许用户定义复杂的数学运算图，这些图可以表示各种机器学习算法，包括深度学习、强化学习和传统的统计模型。它将计算过程抽象为数据流图，其中节点代表操作，边则表示多维数据数组，也称为张量。 2. **分布式执行**： TensorFlow的独特之处在于其对分布式计算的支持。同一个计算图可以在多台设备上并行执行，无论是本地的CPU或GPU，还是云端的集群。这极大地加速了训练过程，并允许处理大规模的数据集。 3. **灵活性与可移植性**：由于TensorFlow的计算图可以在多种硬件平台上运行，它提供了良好的可移植性。开发者可以编写一次代码，在手机、平板、个人电脑，甚至云端数据中心的不同设备上运行，无需进行大量修改。 4. **自动微分**：自动微分是TensorFlow的一个关键特性，它支持高效的梯度计算，这是训练神经网络和其他优化问题的基础。通过构建前向传播图，TensorFlow能够反向传播计算损失函数关于模型参数的梯度，从而实现自动优化。 5. **数据流模型**： TensorFlow采用数据流图模型，使得计算过程具有清晰的依赖关系。这种模型有利于并行化，因为它可以明确哪些操作可以同时执行，哪些必须按顺序进行。 6. **变量与会话**：在TensorFlow中，变量是持久化的状态，它们可以在不同的计算步骤中被更新。会话（Session）是执行图的上下文，负责实际的数据流和变量更新。 7. **库支持与生态系统**： TensorFlow拥有丰富的库支持，包括用于数据预处理、模型评估和可视化工具。同时，一个活跃的社区提供了大量的预训练模型和教程，促进了算法的快速开发和应用。 8. **扩展性**： TensorFlow允许用户编写自定义操作（Ops），这使得它可以扩展到新的硬件平台，如专门的机器学习加速器，或者针对特定任务的定制硬件。 9. **模型保存与恢复**：提供模型保存和恢复机制，使得训练过程可以在任何时候中断并恢复，这对于长时间运行的训练任务特别有用。 10. **研究与生产**： TensorFlow不仅适合于学术研究，也适用于实际产品开发。它的强大功能和易用性使其成为从原型设计到生产部署的全周期解决方案。 TensorFlow以其强大的分布式计算能力、灵活性和广泛的硬件支持，成为了机器学习和人工智能领域的重要工具。从2015年的白皮书发布以来，TensorFlow已经在诸多领域产生了深远影响，推动了深度学习和相关技术的快速发展。

Variables

In most computations a graph is executed multiple times.

Most tensors do not survive past a single execution of the

graph. However, a Variable is a special kind of opera-

tion that returns a handle to a persistent mutable tensor

that survives across executions of a graph. Handles to

these persistent mutable tensors can be passed to a hand-

ful of special operations, such as Assign and AssignAdd

(equivalent to +=) that mutate the referenced tensor. For

machine learning applications of TensorFlow, the param-

eters of the model are typically stored in tensors held in

variables, and are updated as part of the Run of the train-

ing graph for the model.

3 Implementation

The main components in a TensorFlow system are the

client, which uses the Session interface to communicate

with the master, and one or more worker processes, with

each worker process responsible for arbitrating access to

one or more computational devices (such as CPU cores

or GPU cards) and for executing graph nodes on those

devices as instructed by the master. We have both lo-

cal and distributed implementations of the TensorFlow

interface. The local implementation is used when the

client, the master, and the worker all run on a single ma-

chine in the context of a single operating system process

(possibly with multiple devices, if for example, the ma-

chine has many GPU cards installed). The distributed

implementation shares most of the code with the local

implementation, but extends it with support for an en-

vironment where the client, the master, and the workers

can all be in different processes on different machines.

In our distributed environment, these different tasks are

containers in jobs managed by a cluster scheduling sys-

tem [51]. These two different modes are illustrated in

Figure 3. Most of the rest of this section discusses is-

sues that are common to both implementations, while

Section 3.3 discusses some issues that are particular to

the distributed implementation.

Devices

Devices are the computational heart of TensorFlow. Each

worker is responsible for one or more devices, and

each device has a device type, and a name. Device

names are composed of pieces that identify the de-

vice’s type, the device’s index within the worker, and,

in our distributed setting, an identiﬁcation of the job

and task of the worker (or localhost for the case where

the devices are local to the process). Example device

names are "/job:localhost/device:cpu:0" or

"/job:worker/task:17/device:gpu:3". We

have implementations of our Device interface for CPUs

and GPUs, and new device implementations for other de-

vice types can be provided via a registration mechanism.

Each device object is responsible for managing alloca-

tion and deallocation of device memory, and for arrang-

ing for the execution of any kernels that are requested by

higher levels in the TensorFlow implementation.

Tensors

A tensor in our implementation is a typed, multi-

dimensional array. We support a variety of tensor ele-

ment types, including signed and unsigned integers rang-

ing in size from 8 bits to 64 bits, IEEE ﬂoat and double

types, a complex number type, and a string type (an ar-

bitrary byte array). Backing store of the appropriate size

is managed by an allocator that is speciﬁc to the device

on which the tensor resides. Tensor backing store buffers

are reference counted and are deallocated when no refer-

ences remain.

3.1 Single-Device Execution

Let’s ﬁrst consider the simplest execution scenario: a sin-

gle worker process with a single device. The nodes of the

graph are executed in an order that respects the depen-

dencies between nodes. In particular, we keep track of

a count per node of the number of dependencies of that

node that have not yet been executed. Once this count

drops to zero, the node is eligible for execution and is

added to a ready queue. The ready queue is processed in

some unspeciﬁed order, delegating execution of the ker-

nel for a node to the device object. When a node has

ﬁnished executing, the counts of all nodes that depend

on the completed node are decremented.

3.2 Multi-Device Execution

Once a system has multiple devices, there are two main

complications: deciding which device to place the com-

putation for each node in the graph, and then managing

the required communication of data across device bound-

aries implied by these placement decisions. This subsec-

tion discusses these two issues.

3.2.1 Node Placement

Given a computation graph, one of the main responsi-

bilities of the TensorFlow implementation is to map the

computation onto the set of available devices. A sim-

pliﬁed version of this algorithm is presented here. See

Section 4.3 for extensions supported by this algorithm.

One input to the placement algorithm is a cost model,

which contains estimates of the sizes (in bytes) of the

剩余18页未读，继续阅读

小松悦读会|kevinelstri

粉丝: 1918
资源: 9

谷歌TensorFlow：大规模分布式机器学习系统

iAnywhere平台全面支持JAVA的移动企业解决方案

Matlab中心线拟合与Tensorflow应用：Jamin-Lebedeff反演技术

德意志银行金融科技白皮书深度解析

tensorflow_whitepaper

Tensorflow whitePaper 2015版

连通子图个数leetcode-tensorflow-white-paper-notes:TensorFlow白皮书的注释说明和摘要，以及SVG

why-google-data-cloud-2022-whitepaper

Awesome-TensorFlow-Chinese，TensorFlow 中文资源精选，.rar

TensorFlow白皮书（中英文）.rar

tensorflow-in-depth:深度张量流

最新资源