大规模互联网服务端到端性能分析：神秘机器

需积分: 0 162 浏览量更新于2024-06-30 收藏 621KB PDF 举报

"The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services" 这篇论文由Michael Chow、David Meisner、Jason Flinn、Daniel Peek和Thomas F. Wenisch共同撰写，发表于2014年11月6日至8日在科罗拉多州布鲁姆菲尔德举行的第11届USENIX操作系统设计与实现研讨会（11th USENIX Symposium on Operating Systems Design and Implementation）。这篇研究主要探讨了大型互联网服务的端到端性能分析问题。在现代互联网服务中，一个简单的请求可能会触发并行的、复杂的操作流程，这种复杂性对传统的调试和优化方法提出了挑战。论文提出了一种名为"The Mystery Machine"的新工具或框架，用于解决大规模互联网服务的性能分析难题。这个神秘机器的目标是提供一种能够全面理解和优化这些服务的方法，特别是那些涉及大量分布式组件和微服务的系统。作者指出，当前的方法在处理这些服务的性能问题时存在局限性，因为它们通常关注单个组件而不是整个端到端的流程。因此，"The Mystery Machine"旨在提供一种端到端的视角，以便更好地理解请求从客户端到服务器，再到后端服务的完整生命周期中的性能瓶颈。论文内容可能包括以下关键知识点： 1. **端到端性能分析**：在大型互联网服务中，性能分析不仅关注单个组件，还需要考虑所有参与交互的组件。这包括网络延迟、数据库查询效率、服务间的通信延迟等。 2. **分布式系统的挑战**：现代互联网服务通常由众多相互协作的微服务构成，这增加了性能优化的复杂性，因为问题可能出现在任何服务的接口上。 3. **性能瓶颈识别**："The Mystery Machine"提供了一种方法来识别和定位导致整体性能下降的具体环节，这对于优化流程至关重要。 4. **调试和优化工具**：论文可能详细介绍了这个工具的设计原理、实现方法以及在实际场景中的应用案例，展示如何利用它来改进服务性能。 5. **实时监控与反馈**：为了有效地管理和优化大规模服务，系统需要实时监控性能指标，并能快速响应异常情况。 6. **案例研究**：论文可能包含对真实世界大型互联网服务的案例分析，展示了如何使用"The Mystery Machine"解决特定性能问题。通过对"The Mystery Machine"的深入研究，读者可以了解到如何在复杂的服务环境中进行有效的性能诊断和优化，这对于互联网公司的运维人员、软件工程师以及系统架构师来说具有很高的参考价值。

USENIX Association 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’14) 219

DNS resolution, the request is routed to an Edge Load

Balancer (ELB) [16]. ELBs are geo-distributed so as to

allow TCP sessions to be established closer to the user

and avoid excessive latency during TCP handshake and

SSL termination. ELBs also provide a point of indirec-

tion for better load balancing, acting as a proxy between

the user and data center.

Once a request is routed to a particular data center, a

Software Load Balancer routes it to one of many possi-

ble Web servers, each of which runs the HipHop Virtual

Machine runtime [35]. Request execution on the Web

server triggers many RPCs to caching layers that include

Memcache [20] and TAO [7]. Requests also occasionally

access databases.

RPC responses pass through the load-balancing lay-

ers on their way back to the client. On the client, the

exact order and manner of rendering a Web page are

dependent on the implementation details of the user’s

browser. However, in general, there will be a Cascad-

ing Style Sheet (CSS) download stage and a Document

Object Model rendering stage, followed by a JavaScript

execution stage.

As with all modern Internet services, to achieve la-

tency objectives, the handling of an individual request

exhibits a high degree of concurrency. Tens to hun-

dreds of individual components execute in parallel over

a distributed set of computers, including both server and

client machines. Such concurrency makes performance

analysis and debugging complex. Fortunately, standard

techniques such as critical path analysis and slack analy-

sis can tame this complexity. However, all such analyses

need a model of the causal dependencies in the system

being analyzed. Our work ﬁlls this need.

UberTrace: End-to-end Request Tracing

As discussed in the prior section, request execution

at Facebook involves many software components. Prior

to our work, almost all of these components had logging

mechanisms used for debugging and optimizing the indi-

vidual components. In fact, our results show that individ-

ual components are almost always well-optimized when

considered in isolation.

Yet, there existed no complete and detailed instru-

mentation for monitoring the end-to-end performance of

Facebook requests. Such end-to-end monitoring is vital

because individual components can be well-optimized in

isolation yet still miss opportunities to improve perfor-

mance when components interact. Indeed, the opportuni-

ties for performance improvement we identify all involve

the interaction of multiple components.

Thus, the ﬁrst step in our work was to unify the indi-

vidual logging systems at Facebook into a single end-to-

end performance tracing tool, dubbed

UberTrace. Our

basic approach is to deﬁne a minimal schema for the in-

formation contained in a log message, and then map ex-

isting log messages to that schema.

UberTrace requires that log messages contain at least:

1. A unique request identiﬁer.

2. The executing computer (e.g., the client or a partic-

ular server)

3. A timestamp that uses the local clock of the execut-

ing computer

4. An event name (e.g., “start of DOM rendering”).

5. A task name, where a task is deﬁned to be a dis-

tributed thread of control.

UberTrace requires that each <event, task> tuple is

unique, which implies that there are no cycles that would

cause a tuple to appear multiple times. Although this

assumption is not valid for all execution environments, it

holds at Facebook given how requests are processed. We

believe that it is also a reasonable assumption for similar

Internet service pipelines.

Since all log timestamps are in relation to local clocks,

UberTrace translates them to estimated global clock val-

ues by compensating for clock skew.

UberTrace looks

for the common RPC pattern of communication in which

the thread of control in an individual task passes from

one computer (called the client to simplify this explana-

tion) to another, executes on the second computer (called

the server), and returns to the client.

UberTrace calcu-

lates the server execution time by subtracting the latest

and earliest server timestamps (according to the server’s

local clock) nested within the client RPC. It then cal-

culates the client-observed execution time by subtract-

ing the client timestamps that immediately succeed and

precede the RPC. The difference between the client and

server intervals is the estimated network round-trip time

(RTT) between the client and server. By assuming that

request and response delays are symmetric,

UberTrace

calculates clock skew such that, after clock-skew adjust-

ment, the ﬁrst server timestamp in the pattern is exactly

1/2 RTT after the previous client timestamp for the task.

The above methodology is subject to normal variation

in network performance. In addition, the imprecision

of using existing log messages rather than instrument-

ing communication points can add uncertainty. For in-

stance, the ﬁrst logged server message could occur only

after substantial server execution has already completed,

leading to an under-estimation of server processing time

and an over-estimation of RTT.

UberTrace compensates

by calculating multiple estimates. Since there are many

request and response messages during the processing of

a higher-level request, it makes separate RTT and clock

剩余15页未读，继续阅读

顾露

粉丝: 19
资源: 313

大规模互联网服务端到端性能分析：神秘机器

44-郑康鹏-Cortez-2017-Resource-central1

11-郑康鹏-Deep Learning-based Job Placement in Distributed Machine

20210418-长江证券-电力设备新能源行业LiFSI：电解液产业链的下一个制高点.pdf

What-Color-Is-It:应用程序将当前时间戳转换为颜色。 它的灵感来自我见过的类似项目的复制品

【上交所-2024研报】康鹏科技2024年第三季度报告.pdf

康鹏橱柜开业推广创意提案宣贯.pdf

深圳大学- 段华波：快递塑料包装特性及其环境风险分析-2020.8-37页精品报告2020.pdf

商业计划 -【新能源】碳能科技-碳捕捉.pdf

2021-2025年中国OLED及液晶显示材料行业调研及大数据营销战略研究报告.pdf

2021-2025年中国OLED及液晶显示材料行业调研及市场营销战略研究报告.pdf

最新资源

What-Color-Is-It:应用程序将当前时间戳转换为颜色。它的灵感来自我见过的类似项目的复制品