大规模分布式TensorFlow：利用MPI扩展

需积分: 15 48 浏览量更新于2024-09-03 收藏 377KB PDF 举报

"分布式TensorFlow与MPI的整合应用" 在当前大数据时代，机器学习与数据挖掘（MLDM）算法扮演着至关重要的角色，特别是在处理由模拟、实验和移动设备产生的大量数据时。随着数据量的增长，分布式内存系统，如高性能计算集群或云计算系统，成为设计内存内和大规模并行MLDM算法的关键。然而，尽管需求日益增长，大部分开源的MLDM软件仍局限于顺序执行，少数支持多核或众核处理器的并行计算。本文探讨的是将Google的TensorFlow框架扩展到大型集群环境中的工作，利用消息传递接口（MPI）进行分布式执行。这一创新之处在于对TensorFlow运行时的改动极小，使得提出的实现方式具有通用性，并且能方便地被越来越多的TensorFlow用户所采用。TensorFlow是一个强大的深度学习库，它允许开发者构建复杂的神经网络模型，而MPI则是一种用于分布式计算的通信协议，可以协调多个计算节点间的任务分配和数据交换。在论文中，作者Abhinav Vishnu、Charles Siegel和Jeff Daily来自太平洋西北国家实验室，他们提出了一种将TensorFlow与MPI相结合的方法，这使得TensorFlow能够有效地在配备InfiniBand高速网络的集群上运行。InfiniBand以其高带宽和低延迟特性，是高性能计算环境的理想选择。实施过程中，作者对TensorFlow的改动主要集中在如何使其能够利用MPI进行任务调度和数据传输。这种改动最小化的设计策略确保了原有TensorFlow接口的兼容性，使得现有的TensorFlow程序只需少量修改即可在分布式环境中运行。为了验证这一实现的效率，作者在多个知名数据集上进行了性能评估。这些数据集通常被用来测试机器学习模型的性能，如图像分类、自然语言处理等任务。评估结果表明，结合MPI的分布式TensorFlow在处理大规模数据和复杂计算任务时，能够展现出高效的性能，证明了该方法的有效性。这篇论文为大规模机器学习应用提供了新的解决方案，通过将TensorFlow与MPI集成，能够充分利用分布式计算资源，提升训练速度和模型的可扩展性。这对于处理大数据量和复杂模型的科研以及工业界项目来说，是一个重要的进步，有助于推动深度学习在高性能计算领域的应用。

展开

Distributed TensorFlow with MPI

Abhinav Vishnu

Charles Siegel

, and Jeff Daily

#1,2,3

Paciﬁc Northwest National Laboratory, Richland, WA 99352

ABSTRACT

Machine Learning and Data Mining (MLDM) algorithms are

becoming increasingly important in analyzing large volume

of data generated by simulations, experiments and mobile

devices. With increasing data volume, distributed memory

systems (such as tightly connected supercomputers or cloud

computing systems) are becoming important in designing

in-memory and massively parallel MLDM algorithms. Yet,

the majority of open source MLDM software is limited to se-

quential execution with a few supporting multi-core/many-

core execution.

In this paper, we extend recently proposed Google Ten-

sorFlow for execution on large scale clusters using Message

Passing Interface (MPI). Our approach requires minimal

changes to the TensorFlow runtime – making the proposed

implementation generic and readily usable to increasingly

large users of TensorFlow. We evaluate our implementation

using an InﬁniBand cluster and several well known datasets.

Our evaluation indicates the eﬃciency of our proposed im-

plementation.

1. INTRODUCTION

Today, simulations, experiments and mobile devices are

generating increasingly large volume of data [1, 2]. Machine

Learning and Data Mining (MLDM) algorithms, which can

build models, classiﬁers, and anomaly detectors are being

designed and applied in several domains including high en-

ergy physics, computational biology, and cyber security [3,

4, 5].

MLDM algorithms are generally classiﬁed as supervised

(the input dataset is labeled with ground truth) and un-

supervised (learning from un-labeled dataset). Base unsu-

pervised/supervised algorithms can be combined together

using ensemble methods to remove noise, and possibly learn

better models/classiﬁers. Several software packages which

support supervised, unsupervised and ensemble algorithms

have been released publicly. A few well known packages are

Weka [6], Scikit [7], libsvm [8], and Matlab. However, these

packages only support sequential execution. As a result,

they are generally used with modest size datasets.

At the same time, Deep Learning algorithms – a class

of MLDM algorithms – are becoming increasingly popular.

Deep Learning algorithms emulate brain activity by using

several layers of neurons (interconnected with synapses) and

learn the weights for the synapses by using gradient descent

methods. There are several classes of Deep Learning algo-

rithms – Deep Neural Networks (DNN - typically used on

tabular datasets), Convolutional Neural Networks (CNNs -

typically used on images) and Recurrent Neural Networks

(RNNs - typically used on time-dependent datasets). Sev-

eral researchers/practitioners have applied Deep Learning

algorithms to their problems, and reported better results in

comparison to their well published models. Naturally, open

source eﬀorts such as Theano, CuDNN, and Caﬀe [9] have

gained traction and wide acceptance among researchers and

practitioners alike.

Recently, Google released TensorFlow, which is a toolkit

for developing MLDM algorithms. It uses a dataﬂow model

by specifying operations on tensors (user-deﬁned multi-dimensional

arrays). It also supports automatic diﬀerentiation, which

simpliﬁes the design and implementation of gradient de-

scent methods. TensorFlow readily supports DNNs, CNNs

and RNNs on multi-core/many-core systems (GPUs) and

supports algorithmic advancements such as AdaGrad, and

Neuron Dropout for regularization. However, TensorFlow’s

restriction to single compute node is highly restrictive, es-

pecially with increasing size of the datasets.

In this paper, we propose a design to alleviate these lim-

itations of TensorFlow. Speciﬁcally, we extend TensorFlow

for scalable execution on very large scale systems. We con-

sider several programming models, especially MapReduce

based programming models (Hadoop, and Spark) and con-

clude that neither of them are geared towards realizing the

peak potential of the system, while TensorFlow is geared

towards exploiting the architecture eﬀectively using a C++

backend and state of the art linear algebra packages. We

use Message Passing Interface (MPI) [10] as the communi-

cation interface for parallelizing TensorFlow on distributed

memory subsystems. We specify the changes which were

required to realize the implementation on distributed mem-

ory systems. Speciﬁcally, we conclude that these changes

are minimal and require no changes to the TensorFlow run-

time! Our evaluation of the proposed extensions with sev-

eral well known datasets such as MNIST, CIFAR-10, Adult

and Higgs reveals the performance eﬃciency of the proposed

implementation.

2. BACKGROUND

In this section, we provide a brief background of Google

TensorFlow (simply referred as TensorFlow for rest of the

paper) and Message Passing Interface (MPI) [10, 11].

2.1 TensorFlow

Google’s TensorFlow, released in November 2015, is a

platform for building and developing models in machine

learning, particularly neural networks. It is capable of han-

arXiv:1603.02339v1 [cs.DC] 7 Mar 2016

下载后可阅读完整内容，剩余5页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

hopkinsyang

粉丝: 10

大规模分布式TensorFlow：利用MPI扩展

Tensorflow.Tutorial.pdf

Packt.TensorFlow.Machine.Learning.Cookbook.2017.2.pdf

AttributeError: module 'tensorflow.python.keras' has no attribute 'TimeDistributed'

tensorflow.python.framework.errors_impl.resourceexhaustederror: oom when all

tensorflow-master.zip

tensorflow_t2_tensorflow_源码.zip

Learning distributed representations of concepts.pdf

信息安全_数据安全_Distributed trust is blockchain .pdf

Manning - Distributed Programming with JAVA.

DB - The End of a Myth Distributed Transactions Can Scale.pdf

最新资源