轻量级线程：减少通信性能开销

88 浏览量更新于2024-08-25 收藏 297KB PDF 举报

"Featherweight Threads for Communication" 是一篇由KC Sivaramakrishnan、Lukasz Ziarek和Suresh Jagannathan在2011年发布的计算机科学论文，发表于Purdue大学的技术报告CSD TR #11-018。该研究论文探讨了在多线程编程中的消息传递协调机制，其主要吸引力在于它清晰地定义了线程通信的时刻，并将同步和通信功能统一起来。为了提升性能，通常会引入异步或非阻塞扩展，允许发送者和接收者在等待匹配伙伴时继续执行。然而，实际应用这些技术时，由于与之相关的运行时开销问题，实现这些潜在优势一直是个挑战。作者们提出了一个新的概念——寄生线程（parasitic threads），这是一种创新的表达异步计算的机制，目标是减少运行时开销。寄生线程是在其宿主轻量级线程（lightweight thread）的上下文中实现的，它们具有自我调度的能力，并根据通信模式自动迁移。寄生线程的独特之处在于它们作为底层的栈帧存在，这意味着它们可以独立于宿主线程执行，不会占用过多资源。这样做的好处是可以提高并发性能，因为即使在通信伙伴不可用时，线程仍能保持活动并进行其他计算。通过这种设计，论文旨在降低线程间的通信阻塞，优化系统效率，使得实际环境中能够更有效地利用异步通信的优势。论文可能深入讨论了寄生线程的设计原则、实现细节、与传统线程模型的对比分析，以及在特定应用场景下的性能评估。此外，它还可能包括如何在寄生线程中处理错误和恢复策略，以及对潜在的并发控制和数据一致性问题的解决方案。这篇论文对于理解现代并发编程中的高效通信机制，特别是在追求低延迟、高吞吐量的场景下，提供了有价值的理论基础和实践指导。

In order to allow for greater concurrency, we might consider

spawning lightweight threads to execute each of the events in the

list and then aggregate the results of the various computations. The

execution of such an abstraction is presented in Figure 1b. Here, the

thread T

executing chooseAll is suspended until all the results

are gathered. New lightweight threads T

to T

are created to

execute corresponding events. Each lightweight thread will place

the result of the event it executes in its corresponding slot in the

result list. After all of the threads have completed execution, T

resumed with the result of chooseAll .

Even though semantically equivalent to the synchronous im-

plementation, this solution will actually perform worse in many

scenarios than the synchronous solution due to the overheads of

scheduling, synchronization and thread creation costs, even if the

threads are implemented as lightweight entities or one/many-shot

continuations. The most obvious case where performance degra-

dation occurs, is when each of the events encodes a very short

computation. In such cases, it takes longer to allocate and sched-

ule the thread than it takes to complete the event. However, there is

more at play here, and even when some events are long running, the

asynchronous solution incurs additional overheads. We can loosely

categorize these overheads into three groups:

•

Synchronization costs: The creation of a burst of lightweight

threads within a short period of time increases contention for

shared resources such as channels and scheduler queues.

•

Scheduling costs: Besides typical scheduling overheads, the

lightweight threads that are created internally by the asyn-

chronous primitive might not be scheduled prior to threads

explicitly created by the programmer. In such a scenario, the

completion of a primitive, implicitly creating threads for asyn-

chrony, is delayed. In the presence of tight interaction between

threads, such as synchronous communication, if one of the

threads is slow, progress is affected in all of the transitively

dependent threads.

•

Garbage collection costs: Creating a large number of threads

in short period of time increases allocation and might subse-

quently trigger a collection. This problem becomes worse in a

parallel setting, when multiple mutators might be allocating in

parallel.

In this paper, we explore a novel threading mechanism, called

parasitic threads, that reduces costs associated with lightweight

threading structures. Parasitic threads allow the expression of arbi-

trary asynchronous computation, while mitigating typical thread-

ing costs. Parasitic threads are especially useful to model asyn-

chronous computations that are usually short-lived, but can also be

arbitrarily long. Consider once again our chooseAll primitive.

It takes an arbitrary list of events, so at any given invokation of

chooseAll we do not know apriori if a particular event is short or

long lived.

An implementation of chooseAll using parasitic threads is de-

scribed in Section 2.3.2. Such an implementation, abstractly, delays

creating threads unless a parasite performs a blocking action. In

practice, this alleviates scheduling and GC costs. Even when para-

sites block and resume, they are reiﬁed into an entity that does not

impose synchronization and GC overheads. If the computation they

encapsulate is long running, they can be inﬂated to a lightweight

thread anytime during execution.

Importantly, in the implementation of chooseAll with parasitic

threads, if all of the events in the list are available before the exe-

cution of chooseAll , none of the parasites created for executing

individual events will block. In addition to an implementation of

chooseAll , we show how parasitic threads can be leveraged at

the library level to implement a library of specialized asynchronous

primitives. We believe parasitic threads are a useful runtime tech-

nique to accelerate the performance of functional runtime systems.

This paper makes the following contributions:

•

The design and implementation of parasitic threads, a novel

threading mechanism that allows for the expression of a logical

thread of control using raw stack frames. Parasitic threads can

easily be inﬂated into lightweight threads if necessary.

•

A formal semantics governing the behavior of parasitic threads.

•

A case study leveraging the expressivity of parasitic threads to

implement a collection of asynchronous primitives, and illus-

trating the performance beneﬁts of parasitic threads through a

collection of micro-benchmarks.

•

A detailed performance analysis of runtime costs of parasitic

threads over a large array of benchmarks, including a full-

ﬂedged web server. The performance analysis is performed on

two distinct GC schemes to illustrate that the parasitic threads

are beneﬁcial irrespective of the underlying GC.

The rest of the paper is organized as follows: in Section 2, we

present our base runtime system and the design of parasitic threads.

In Section 3, we provide a formal characterization of parasitic

threads and show their semantic equivalence to classic threads. We

discuss salient implementation details in Section 4. We present a

case study illustrating the utility of parasitic threads in the con-

struction of asynchronous primitives 5. We present our experiment

details and results in Section 6. Related work and concluding re-

marks are given in Section 7 and Section 8 respectively.

2. System Design

In this section, we describe the salient details for the design of

parasitic threads and relevant characteristic of our runtime system.

We envision parasitic threads to be used as a fundamental build-

ing block for constructing asynchronous primitives. With this in

mind, we introduce an API for programming with parasitic threads,

and show how to construct an efﬁcient chooseAll primitive intro-

duced earlier.

2.1 Base System

Our runtime system is built on top of lightweight (green) threads,

synchronous communication, and leverages a GC. Our threading

system multiplexes many lightweight threads on top of a few ker-

nel threads. We leverage one kernel thread for each processor or

core for a given system. The kernel threads are also pined to the

processor. Hence, the runtime views a kernel thread as a virtual

processor. The number of kernel threads is determined statically

and is speciﬁed by the user. Kernel threads are not created during

program execution, instead, all spawn primitives create lightweight

threads.

Threads communicate in our system through synchronous mes-

sage passing primitives based on PCML [25], a parallel deﬁnition

of CML. Threads may perform sends or receives to pass data be-

tween one another. Such primitives block until a matching com-

munication partner is present. Our system supports two different

GC schemes; a stop-the-world collector with parallel allocation

and a split-heap parallel GC scheme. Necessary details about both

GC designs, with respect to parasites and their implementation are

given in Section 4.

2 2011/10/20

剩余12页未读，继续阅读

weixin_38744962

粉丝: 9
资源: 968

轻量级线程：减少通信性能开销

jquery-easyui-datagridview_easyuidatagridview_TheEditor_

Featherweight Virtual Machine-开源

featherweight-openflow

nanopass-featherweight

featherweight-router:轻量级项目的路由器

featherweight-C:轻量级C，可执行语义

Atom-featherweight,用于终端shell的新闻聚合聚合器.zip

fs2c:从Featherweight Scala到C

featherweight:一个简单的微框架提供节点应用程序结构

FeatherWeight:轻量级持久缓存，可从文件系统中恢复自身

最新资源