没有合适的资源?快使用搜索试试~ 我知道了~
首页[Google] Introduction to Distributed System Design.pdf
[Google] Introduction to Distributed System Design.pdf
需积分: 11 123 浏览量
更新于2023-05-24
评论
收藏 211KB PDF 举报
Google分布式设计教程: Introduction to Distributed System Design Table of Contents Audience and Pre-Requisites The Basics So How Is It Done? Remote Procedure Calls Distributed Design Principles Exercises References
资源详情
资源评论
资源推荐

Introduction to Distributed System Design
Table of Contents
Audience and Pre-Requisites
The Basics
So How Is It Done?
Remote Procedure Calls
Distributed Design Principles
Exercises
References
Audience and Pre-Requisites
This tutorial covers the basics of distributed systems design. The pre-requisites are significant programming experience
with a language such as C++ or Java, a basic understanding of networking, and data structures & algorithms.
The Basics
What is a distributed system? It's one of those things that's hard to define without first defining many other things. Here is
a "cascading" definition of a distributed system:
A program
is the code you write.
A process
is what you get when you run it.

A message
is used to communicate between processes.
A packet
is a fragment of a message that might travel on a wire.
A protocol
is a formal description of message formats and the rules that two processes must follow in order to exchange those
messages.
A network
is the infrastructure that links computers, workstations, terminals, servers, etc. It consists of routers which are
connected by communication links.
A component
can be a process or any piece of hardware required to run a process, support communications between processes,
store data, etc.
A distributed system
is an application that executes a collection of protocols to coordinate the actions of multiple processes on a
network, such that all components cooperate together to perform a single or small set of related tasks.
Why build a distributed system? There are lots of advantages including the ability to connect remote users with remote
resources in an open and scalable way. When we say open, we mean each component is continually open to interaction
with other components. When we say scalable, we mean the system can easily be altered to accommodate changes in
the number of users, resources and computing entities.
Thus, a distributed system can be much larger and more powerful given the combined capabilities of the distributed
components, than combinations of stand-alone systems. But it's not easy - for a distributed system to be useful, it must be
reliable. This is a difficult goal to achieve because of the complexity of the interactions between simultaneously running
components.
To be truly reliable, a distributed system must have the following characteristics:
• Fault-Tolerant: It can recover from component failures without performing incorrect actions.

• Highly Available: It can restore operations, permitting it to resume providing services even when some components
have failed.
• Recoverable: Failed components can restart themselves and rejoin the system, after the cause of failure has been
repaired.
• Consistent: The system can coordinate actions by multiple components often in the presence of concurrency and
failure. This underlies the ability of a distributed system to act like a non-distributed system.
• Scalable: It can operate correctly even as some aspect of the system is scaled to a larger size. For example, we
might increase the size of the network on which the system is running. This increases the frequency of network
outages and could degrade a "non-scalable" system. Similarly, we might increase the number of users or servers, or
overall load on the system. In a scalable system, this should not have a significant effect.
• Predictable Performance: The ability to provide desired responsiveness in a timely manner.
• Secure: The system authenticates access to data and services [1]
These are high standards, which are challenging to achieve. Probably the most difficult challenge is a distributed system
must be able to continue operating correctly even when components fail. This issue is discussed in the following excerpt
of an interview with Ken Arnold. Ken is a research scientist at Sun and is one of the original architects of Jini, and was a
member of the architectural team that designed CORBA.
Failure is the defining difference between distributed and local programming, so you have to design distributed systems
with the expectation of failure. Imagine asking people, "If the probability of something happening is one in 10
13
, how often
would it happen?" Common sense would be to answer, "Never." That is an infinitely large number in human terms. But if
you ask a physicist, she would say, "All the time. In a cubic foot of air, those things happen all the time."
When you design distributed systems, you have to say, "Failure happens all the time." So when you design, you design
for failure. It is your number one concern. What does designing for failure mean? One classic problem is partial failure. If I
send a message to you and then a network failure occurs, there are two possible outcomes. One is that the message got
to you, and then the network broke, and I just didn't get the response. The other is the message never got to you because
the network broke before it arrived.
So if I never receive a response, how do I know which of those two results happened? I cannot determine that without
eventually finding you. The network has to be repaired or you have to come up, because maybe what happened was not
剩余14页未读,继续阅读












安全验证
文档复制为VIP权益,开通VIP直接复制

评论0