Nancy Lynch的分布式算法讲座笔记

需积分: 10 89 浏览量更新于2024-07-16 收藏 2.08MB PDF 举报

这份名为"Distributed Algorithms"的报告是Nancy Lynch教授在秋季学期使用的研究生课程讲义。Nancy Lynch和助教Boaz Patt-Shamir共同编撰了这些笔记，主要包含了详尽的课堂讲座内容。这些笔记旨在提供一个深入理解分布式算法的基础，适用于常规课程的授课活动。课程的核心部分着重于分布式系统中的关键概念、协议设计、一致性模型、共识问题、分片与容错性、P2P网络结构以及分布式数据结构等主题。读者可能会发现这些笔记经过了精心打磨，但同时也欢迎读者提出关于内容准确性或改进意见，因为作者期待能得到反馈以提高教学质量。除了常规课堂讲座笔记外，报告还包含了家庭作业任务，这些都是为了帮助学生将理论知识应用到实践中。报告最后附录的部分则是对学期结束后额外讲解的三堂课程的补充材料，这些内容涵盖了未能在预定课程时间范围内涵盖的重要主题。值得注意的是，这些附加讲座笔记可能相对较粗糙，反映了它们是在学期结束后临时添加的性质。报告的编者对参与课程的学生表示了深深的感谢，他们的学习和反馈对于完善这些资料起到了关键作用。这份报告是一份宝贵的资源，不仅提供了深入的分布式算法理论知识，也展示了如何在实际教学环境中处理复杂问题和不断完善的教学过程。对于对分布式计算感兴趣的读者，无论是学生还是研究人员，这都是一份值得仔细研读和学习的参考资料。



Prove rigorously that the algorithms solve the problems.



Analyze the complexity of the algorithms.



Prove corresp onding impossibility results.

Note the emphasis on rigor this is imp ortant in this area, b ecause of the subtle compli-

cations that arise. A rigorous approach seems necessary to be sure that the problems are

meaningful, the algorithms are correct, the impossibility results are true and meaningful,

and the interfaces are suciently well-dened to allow system building.

However, because of the many complications, rigor is hard to achieve. In fact, the devel-

opment of good formal methods for describing and reasoning about distributed algorithms

is the sub ject of a go od deal of recent research. Sp ecically, there has been much serious

work in dening appropriate

formal mathematical models

, b oth for describing the algorithms

and for describing the problems they are supposed to solve a considerable amountof work

has also b een devoted to

proof methods

. One diculty in carrying out a rigorous approach

is that, unlike in many other areas of theoretical computer science, there is no any single

accepted formal model to cover all the settings and problems that arise in this area. This

phenomenon is unavoidable, because there are so manyvery dierent settings to b e studied

(consider the dierence between shared memory and message-passing), and each of them has

its own suitable formal mo dels.

So, rigor is a goal to b e striven for, rather than one that wewillachieveentirely in this

course. Due to time limitations, and (sometimes) the diculty of making formal presenta-

tions intuitively understandable, the presentation in class will b e a mixture of rigorous and

intuitive.

1.1.3 Overview of the Course

There are many dierent orders in which this material could be presented. In this course, we

divide it up rst according to

timing assumptions

, since that seems to b e the most imp ortant

model distinction. The timing mo dels to be considered are the following.

synchronous

: This is the simplest model. We assume that components take steps simulta-

neously, i.e., the execution pro ceeds in synchronous rounds.

asynchronous

: Here we assume that the separate comp onents take steps in arbitrary order.

partial ly synchronous

(timing-based): This is an \in-between" model | there are some

restrictions on relative timing of events, but execution is not completely lock-step.

The next division is by the IPC mechanism: shared memory vs. message-passing. Then

we sub divide by the problem studied. And nally, each mo del and problem can be considered

with various failure assumptions.

Wenowgoover the bibliographical list and the tentativeschedule (Handouts 2 and 3).

The bibliographical list do esn't completely corresp ond to the order of topics to be covered

the dierence is that the material on models will b e spread throughout the course as needed.

General references.

These include the previous course notes, and some related bo oks.

There is not really muchin the way of textbo oks on this material.

Intro duction.

The chapter on distributed computing in the handb ook on Theoretical

Computer Science is a sketchyoverview of some of the mo deling and algorithmic ideas.

Mo dels and Pro of Metho ds.

We shall not study this as an isolated topic in the course

{ rather, it is distributed through the various units. The basic mo dels used are

automata-

theoretic

, starting with a basic state-machine mo del with little structure. These state ma-

chines need not necessarily be nite-state.

Invariant assertions

are often proved about

automaton states, by induction. Sometimes we use one automaton to represent the problem

being solved, and another to represent the solution then a correctness proof b oils down

to establishing a corresp ondence that preserves the desired

external behavior

. In this case,

proving the corresp ondence is often done using a

mapping

simulation

method. Specially

tailored state machine mo dels have been designed for some special purposes, e.g., for shared

memory models (where the structure consists of processes and shared variables). Another

model is the I/O automaton mo del for

reactive systems

, i.e., systems that interact with an

external environment in an ongoing fashion. This mo del can mo del systems based on shared

variables, but is more appropriate for message-passing systems. One of the key features of

this mo del is that it has goo d

compositionality

properties, e.g., that the correctness of a com-

pound automaton can be proved using the correctness of its comp onents.

Temporal logic

an example of a sp ecial set of metho ds (language, logic) mainly designed for proving

liveness

properties (e.g., something eventually happens).

Timedmodels

are mainly newer research

work. Typically, these are specially-tailored models for talking ab out timing-based systems

{ e.g., those whose components have access to system clocks, can use timeouts, etc.

Alge-

braic methods

are an important research subarea (but we will not have time for this). The

algebraic methods describe concurrent processes and systems using algebraic expressions,

then use equations involving these expressions to prove equivalences and implementation

relationships among the processes.

Synchronous Message-Passing.

As noted, the material is organized rst by timing

model. The simplest mo del (i.e., the one with the least uncertainty) is the

synchronous

model, in which all the processes take steps in synchronous rounds. The shared-memory

version of this mo del is the PRAM. Since it is studied in other courses, we shall skip this

sub ject, and start with synchronous networks.

We spend the rst twoweeks or so of the course on problems in synchronous networks

that are typical of the distributed setting. In the network setting wehave the processors

at the no des of a graph

,communicating with their neighbors via messages in the edges.

We start with a simple toy example, involving

ring computation

. The problem is to elect a

unique leader in a simple network of pro cessors, which are assumed to be identical except for

Unique Identiers (UID's). The uncertainty is that the size of network, and the set of ID's

of processors, are unknown (although it is known that the UID's are indeed unique). The

main application for this problem is a token ring, where there is a single token circulating,

and sometimes it it necessary to regenerate a lost token. We shall see some details of the

modeling, and some typical complexity measures will be studied. For example, we shall

show upper and lower bounds for the time and the amount of communication (i.e., number

of messages) required. We shall also study some other problems in this simple setting.

Next, we'll go through a brief survey of some proto cols in more general networks. We

shall see some proto cols used in unknown synchronous networks of processes to solve some

basic problems like nding shortest paths, dening a minimum spanning tree, computing a

maximal indep endent set, etc.

Then we turn to the problem of

reaching consensus

. This refers to the problem of

reaching agreement on some abstract fact, where there are initial dierences of opinion.

The uncertainty here stems not only from dierent initial opinions, but also from

processor

failures

.We consider failures of dierenttypes: stopping, where a processor suddenly stops

executing its local protocol omission, where messages may be lost en route and Byzantine,

where a faulty pro cessor is completely unrestricted. This has b een an active research area

in the past few years, and there are manyinteresting results, so we shall spend a couple of

lectures on this problem. We shall see some interesting b ounds on the number of tolerable

faults, time, and communication.

Asynchronous Shared Memory.

After \warming up" with synchronous algorithms (in

which there is only a little uncertainty), wemoveinto the more characteristic (and possibly

more interesting) part of the course, on

asynchronous

algorithms. Here, processors are no

longer assumed to take steps in lo ck-step synchrony, but rather can interleave their steps in

arbitrary order, with no b ound on individual pro cess speeds. Typically,the interactions with

the external world (i.e., input/output) are ongoing, rather than just initial input and nal

output. The results in this setting have quite a dierentavor from those for synchronous

networks.

The rst problem we deal with is

mutual exclusion

This is one of the fundamental (and

historically rst) problems in this area, and consequently,muchwork has b een dedicated

to exploring it. Essentially, the problem involves arbitrating access to a single, indivisible,

exclusive-use resource. The uncertainty here is ab out who is going to request access and

when. We'll sketch some of the important algorithms, starting with the original algorithm

of Dijkstra. Many important concepts for this eld will be illustrated in this context, in-

cluding progress, fairness, fault-tolerance, and time analysis for asynchronous algorithms.

We shall see upp er bounds on the amount of shared memory, corresponding lower b ounds,

and imp ossibility results. We shall also discuss generalizations of mutual exclusion to more

general resource allocation problems. For example, we will consider the

Dining Philosophers

problem { a prototypical resource allocation problem.

Next, we shall study the concept of

atomic registers

:sofar,wehave been assuming indi-

visible access to shared memory.Buthow can one implement this on simpler architectures?

We shall look at several algorithms that solve this problem in terms of weaker primitives. An

interesting new prop erty that app ears here is

wait-freeness

,which means that any op eration

on the register must complete regardless of the failure of other concurrent op erations.

atomic snapshot

is a convenient primitive for shared read-write memory. Roughly

speaking, the ob jectiveis to take an instantaneous snapshot of all the memory locations at

once. An atomic snapshot is a useful primitiveto have for building more powerful systems.

We shall see how to implement it.

concurrent timestamp system

is another nice primitive. This is a system that issues,

upon request, timestamps that can b e used by programs to establish a consistent order

among their operations. The twist here is how to implementsuch systems with bounded-

memory. It turns out out such bounded timestamp systems can b e built in terms of atomic

snapshots. Also, a concurrent timestamp system can b e used to build more p owerful forms

of shared memory,suchasmulti-writer multi-reader memory.

We shall also reconsider the

consensus

problem in the asynchronous shared memory

model, and provetheinteresting fact it is impossible to solve in this setting.

A p ossible new topic this time is

shared memory for multiprocessors

.Much recent research

is aimed at identifying dierenttypes of memory abstractions that are used, or mightbe

useful, for real systems. Systems architects who develop suchtyp es of memory frequently do

not give precise statements of what their memory guarantees (esp ecially in the presence of

concurrent accesses and failures). So recently, theoreticians have started trying to do this.

Asynchronous Message-Passing Systems.

This section deals with algorithms that op-

erate in asynchronous networks. Again, the system is modeled as a graph with pro cessors at

nodes, and communication links are represented by the edges, but now the system do es not

operate in rounds. In particular, messages can arrive at arbitrary times, and the pro cessors

can take steps at arbitrary sp eeds. One mightsaythat wenowhave \looser coupling" of

the components of the system: wehave more independence and uncertainty.

Computing in static graphs

. The simplest type of setting here is computation in a xed

(unknown) graph in which the inputs arrive at the beginning, and there is a single output

to be produced. Some examples are leader election in ring, and minimum spanning tree

computation.

Network synchronization

.At this point, we could plunge into a study the many sp ecial-

purpose algorithms designed expressly for asynchronous distributed networks. But instead,

we shall rst try to imp ose some structure on such algorithms by considering \algorithm

transformations" that can be used to run algorithms designed for a simpler computation

model on a a complex asynchronous network.

The rst example here arises in the very imp ortant pap er by Lamp ort, where he shows

a simple metho d of assigning consistent

logical times

to events in a distributed network.

This can be used to allow an asynchronous network to simulate one in which the nodes

have access to p erfectly synchronized real-time clocks. The second example is Awerbuch's

synchronizer

, which allows an asynchronous network to simulate the lock-step synchronous

networks discussed in the rst part of the course (at least, those without failures), and to

do so eciently.We shall contrast this simulation result with an interesting lower bound

that seems to say that any such simulation must b e inecient (the apparentcontradiction

turns out to depend on the kind of problem being solved). Third, we shall see that an

synchronous network can simulate a centralized (non-distributed) state machine. And fourth,

an asynchronous network can be used to simulate asynchronous shared memory.Any of these

simulations can b e used to run algorithms from simpler mo dels in the general asynchronous

network model.

Next, we shall lo ok at some specic problems, such as resource allocation. We shall see

how to solvemutual exclusion, dining philosophers etc. in networks.

Detection of stable properties

refers to a class of problems with a similar avor and a

common solution. Suppose that there is a separate algorithm running, and wewantto

design another algorithm to \monitor" the rst. To monitors here might mean, for instance,

to detect when it terminates or deadlo cks, or to take a \consistent snapshot" of its state.

We shall also revisit the

consensus

problem in the context of networks. The problem is

easy without faults, but with faults, it is mostly impossible (even for very simple types of

剩余439页未读，继续阅读

lxszship

粉丝: 0
资源: 5

Nancy Lynch的分布式算法讲座笔记

Distributed Algorithms

Distributed Algorithms (中文名：分布式算法)

Distributed_algorithms.pdf

Distributed Computing Principles Algorithms And Systems.pdf

Distributed Computing.pdf

Distributed Graph Algorithms for Computer Networks

introduction to distributed algorithms ppt

Clustering Algorithms to Hadoop MapReduce Framework.pdf

Distributed Algorithms （并行算法）- Part2

NSX-T hol-2026-01-net_pdf_en.pdf

最新资源