Dryad：构建分布式数据并行程序的框架

需积分: 9 123 浏览量更新于2024-09-11 收藏 508KB PDF 举报

"Dryad是分布式数据并行程序的一个框架，它允许开发人员使用顺序构建块来创建并行应用。这个系统将计算的‘顶点’与通信的‘通道’结合，形成一个数据流图，然后在一组可用的计算机上执行这些顶点，通过文件、TCP管道和共享内存FIFOs进行适当通信。Dryad应用程序的顶点简单，通常以无线程创建或锁定的顺序程序编写，并发性源于Dryad调度器在多台计算机或多核CPU上同时运行顶点。应用程序可以在运行时动态发现数据的大小和位置，并根据计算进程修改图形结构。" Dryad是分布式计算领域的一个重要创新，它的设计理念是将复杂的分布式并行编程简化为更易于管理和理解的组件。这个框架的核心思想是将计算任务分解为一系列基本单元，称为“顶点”（Vertices），每个顶点负责执行特定的计算任务。这些顶点之间通过“通道”（Channels）进行数据交换，形成了一个表示任务依赖关系的数据流图。顶点是Dryad系统中的基本执行单元，它们可以是简单的序列化程序，这意味着开发者不需要处理线程同步和锁的问题。这种设计大大降低了编写分布式程序的复杂性，使得开发人员可以专注于业务逻辑，而无需深入理解底层的并发控制和分布式系统细节。通信渠道是连接顶点的关键，它们提供了数据在不同计算节点间流动的途径。Dryad支持多种通信方式，包括文件系统、TCP管道以及共享内存FIFOs，这为开发者提供了灵活的数据传输选择，可以根据不同的性能需求和环境条件选择合适的通信机制。 Dryad的另一个重要特性是其动态适应性。应用程序可以在运行时发现数据的大小和位置，并且能够根据计算的进度和资源的可用性动态调整数据流图的结构。这种自适应性对于处理大规模、不可预测的数据流或者需要应对硬件故障的情况特别有用。 Dryad提供了一个抽象层，让开发者能够用更直观的方式构建分布式数据并行应用，而不需要直接处理底层的分布式系统挑战。它的设计思路已经被许多后来的系统所借鉴，如Hadoop的MapReduce模型，以及Google的FlumeJava等，这些系统都在一定程度上体现了Dryad的哲学，即通过简单的构建块构建复杂的大规模并行计算系统。

At run time each channel is used to transport a ﬁnite se-

quence of structured items. This channel abstraction has

several concrete implementations that use shared memory,

TCP pipes, or ﬁles temporarily persisted in a ﬁle system.

As far as the program in each vertex is concerned, channels

produce and consume heap objects that inherit from a base

type. This means that a vertex program reads and writes its

data in the same way regardless of whether a channel seri-

alizes its data to buﬀers on a disk or TCP stream, or passes

object pointers directly via shared memory. The Dryad sys-

tem does not include any native data model for serializa-

tion and the concrete type of an item is left entirely up to

applications, which can supply their own serialization and

deserialization routines. This decision allows us to support

applications that operate directly on existing data includ-

ing exported SQL tables and textual log ﬁles. In practice

most applications use one of a small set of library item types

that we supply such as newline-terminated text strings and

tuples of base types.

A schematic of the Dryad system organization is shown

in Figure 1. A Dryad job is coordinated by a process called

the “job manager” (denoted JM in the ﬁgure) that runs

either within the cluster or on a user’s workstation with

network access to the cluster. The job manager contains

the application-speciﬁc code to construct the job’s commu-

nication graph along with library code to schedule the work

across the available resources. All data is sent directly be-

tween vertices and thus the job manager is only responsible

for control decisions and is not a bottleneck for any data

transfers.

Files, FIFO, Network

Job schedule

Data plane

Control plane

D D DNS

V V V

Figure 1: The Dryad system organization. The job manager (JM)

consults the name server (NS) to discover the list of available com-

puters. It maintains the job graph and schedules running vertices (V)

as computers become available using the daemon (D) as a proxy.

Vertices exchange data through ﬁles, TCP pipes, or shared-memory

channels. The shaded bar indicates the vertices in the job that are

currently running.

The cluster has a name server (NS) that can be used to

enumerate all the available computers. The name server

also exposes the position of each computer within the net-

work topology so that scheduling decisions can take account

of locality. There is a simple daemon (D) running on each

computer in the cluster that is responsible for creating pro-

cesses on behalf of the job manager. The ﬁrst time a vertex

(V) is executed on a computer its binary is sent from the job

manager to the daemon and subsequently it is executed from

a cache. The daemon acts as a proxy so that the job man-

ager can communicate with the remote vertices and monitor

the state of the computation and how much data has been

read and written on its channels. It is straightforward to run

a name server and a set of daemons on a user workstation

to simulate a cluster and thus run an entire job locally while

debugging.

A simple task scheduler is used to queue batch jobs. We

use a distributed storage system, not described here, that

shares with the Google File System [21] the property that

large ﬁles can be broken into small pieces that are replicated

and distributed across the local disks of the cluster comput-

ers. Dryad also supports the use of NTFS for accessing ﬁles

directly on local computers, which can be convenient for

small clusters with low management overhead.

2.1 An example SQL query

In this section, we describe a concrete example of a Dryad

application that will be further developed throughout the re-

mainder of the paper. The task we have chosen is representa-

tive of a new class of eScience applications, where scientiﬁc

investigation is performed by processing large amounts of

data available in digital form [24]. The database that we

use is derived from the Sloan Digital Sky Survey (SDSS),

available online at http://skyserver.sdss.org.

We chose the most time consuming query (Q18) from a

published study based on this database [23]. The task is to

identify a “gravitational lens” eﬀect: it ﬁnds all the objects

in the database that have neighboring objects within 30 arc

seconds such that at least one of the neighbors has a color

similar to the primary object’s color. The query can be

expressed in SQL as:

select distinct p.objID

from photoObjAll p

join neighbors n — call this join “X”

on p.objID = n.objID

and n.objID < n.neighborObjID

and p.mode = 1

join photoObjAll l — call this join “Y”

on l.objid = n.neighborObjID

and l.mode = 1

and abs((p.u-p.g)-(l.u-l.g))<0.05

and abs((p.g-p.r)-(l.g-l.r))<0.05

and abs((p.r-p.i)-(l.r-l.i))<0.05

and abs((p.i-p.z)-(l.i-l.z))<0.05

There are two tables involved. The ﬁrst, photoObjAll

has 354,254,163 records, one for each identiﬁed astronomical

object, keyed by a unique identiﬁer objID. These records

also include the object’s color, as a magnitude (logarithmic

brightness) in ﬁve bands: u, g, r, i and z. The second table,

neighbors has 2,803,165,372 records, one for each object

located within 30 arc seconds of another object. The mode

predicates in the query select only “primary” objects. The

< predicate eliminates duplication caused by the neighbors

relationship being symmetric. The output of joins “X” and

“Y” are 932,820,679 and 83,798 records respectively, and the

ﬁnal hash emits 83,050 records.

The query uses only a few columns from the tables (the

complete photoObjAll table contains 2 KBytes per record).

When executed by SQLServer the query uses an index on

photoObjAll keyed by objID with additional columns for

mode, u, g, r, i and z, and an index on neighbors keyed by

objID with an additional neighborObjID column. SQL-

Server reads just these indexes, leaving the remainder of the

tables’ data resting quietly on disk. (In our experimental

setup we in fact omitted unused columns from the table, to

avoid transporting the entire multi-terabyte database across

剩余13页未读，继续阅读

mooling

粉丝: 2
资源: 9

Dryad：构建分布式数据并行程序的框架

dryad：Another presentation。

java6.0源码-vagrant-dryad:用于构建(v1)DryadVM的Vagrant和Ansible配置

matlab进度条显示代码运行-DRYAD:本杰明（2010）DYRAD模型

dryad-app:用于Dryad和Dash数据发布和管理服务协作的代码库

RTLAD Dryad-开源

Algorithm-Dryad

dryad2dataverse:一个用于将数据和元数据从Dryad传输到Dataverse的python库

Algorithm-Dryad.zip

rdryad:Dryad Web服务的R客户端

HaLoop: Efficient Iterative Data Processing on Large Clusters

最新资源