STOPLESS：实时多处理器垃圾收集器

171 浏览量更新于2024-08-25 收藏 170KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"STOPLESS - A Real-Time Garbage Collector for Multiprocessors (10.1.1.108.322)-计算机科学" 在计算机科学领域，垃圾收集（Garbage Collection, GC）是自动内存管理的一种技术，用于回收不再使用的内存空间。STOPLESS是一种针对多处理器环境的实时垃圾收集器，专门设计用于处理并行多线程应用程序。这篇论文的作者包括Filip Pizlo、Daniel Frampton、Erez Petrank、Bjarne Steensgaard，他们提出了一种创新的解决方案，旨在解决在现代多核平台上实现实时GC的难题。传统的实时垃圾收集器在多处理器系统上面临挑战，尤其是在需要锁自由（lock-free）操作的实时环境中。锁自由是指在不使用互斥锁的情况下确保并发数据结构的正确性，这对于高性能的并发应用至关重要。STOPLESS的独特之处在于，它是第一个既能保证实时响应，又能保持锁自由，支持原子操作，控制碎片，并适应现代并行平台的垃圾收集器。 STOPLESS的设计考虑了现代语言如C#或Java的需求，这两种语言都依赖于垃圾收集来管理内存。它建立在Bartok编译器和C#运行时之上，实测结果证明了其性能和效率。通过并发执行，STOPLESS能够在不影响实时性能的前提下，高效地进行垃圾收集，确保了多线程应用的稳定性和可预测性。实时垃圾收集的主要挑战在于平衡内存的回收与应用的执行，尤其是在有硬实时约束（hard real-time constraints）的系统中。STOPLESS通过引入紧凑（compaction）机制，有效地控制了内存碎片，这有助于提高内存利用率并降低系统暂停时间。紧凑过程是在不中断应用程序执行的情况下进行的，这进一步强化了其实时性能。此外，STOPLESS还支持原子操作，这是在多线程环境中实现并发和锁自由的关键。原子操作确保了在并发执行时数据的一致性，防止数据竞争问题。这对于构建高性能的实时系统至关重要，因为任何未预期的数据改变都可能导致系统无法满足其严格的响应时间要求。 STOPLESS是多处理器实时系统的一个重要进展，它克服了现有实时GC的局限，提供了兼顾性能、实时性和内存管理的优秀解决方案。这一成果对实时软件开发，尤其是那些依赖多核处理和并发执行的领域，如嵌入式系统、航空航天、自动化和游戏开发等，具有深远的影响。通过STOPLESS，开发者可以更加专注于应用程序的逻辑，而无需过分担忧内存管理的复杂性和实时性能的妥协。

资源详情

资源推荐

3. CoCo: A Concurrent Compactor

In this section we present CoCo, which is a non-intrusive concur-

rent compaction mechanism allowing moving of objects concur-

rently with the run of the program threads, providing high respon-

siveness and maintaining a program’s lock-freedom.

The CoCo mechanism can be incorporated into a full com-

paction algorithm, such as the Compressor [23], to compact the

entire heap and eliminate fragmentation, or it may be used with any

on-the-ﬂy mark and sweep collector [14, 13, 15] (as it is used here)

to do partial compaction to reduce fragmentation. The overhead of

CoCo increases with the number of objects to be moved, because its

overhead is higher during the move. Thus, its design goal was that

of a partial compactor. In STOPLESS, we employ the mark-sweep

collector to ﬁnish updating pointers to the relocated objects. This is

an easy task while traversing the graph of live objects (proposed by

[11]). Alternatively, an additional ﬁnal stage can be added to let the

compactor explicitly and concurrently ﬁx pointers, perhaps using a

mechanism such as the one employed by the Compressor [23].

When an object is to be moved in CoCo, it must be tagged

previous to the run of CoCo (e.g., by the sweep procedure) by

atomically setting a bit in the object header and adding it to a

list accessible to CoCo. Creating a copy of the original object and

making the program switch to working with the new object instead

of the original one, keeping lock-freedom, maintaining acceptable

memory coherence, and reducing the overheads to an acceptable

measure is nontrivial. The original lock-free copying mechanism

of Herlihy and Moss [20] employed a chain of immutable copies

of the object, one for each object mutation. This is a high overhead

to pay in space and time. CoCo also incurs some, though smaller,

space and time overheads. First, CoCo employs a read barrier,

which has its cost, but an interesting cloning mechanism is used

to eliminate this cost almost entirely when the compactor is idle.

Second, during object copying, CoCo creates a temporary wide

object to represent a mutated object. A forwarding pointer is kept

in each old object pointing to the new copy of the object. In the

wide object, each ﬁeld is juxtaposed with a status ﬁeld; the ‘wide’

ﬁeld (the status and original ﬁeld combination) can be atomically

modiﬁed using a compare-and-swap. We ensure that wide ﬁelds are

at most twice the size of the processor word; for example, on a 32-

bit architecture the largest wide ﬁeld would have a 32-bit status and

a 32-bit payload

, thus allowing a 64-bit compare-and-swap to be

used. Such a double-word compare-and-swap is available on most

modern instruction set architectures. If the original ﬁeld is already

twice the processor word size (such as a 64-bit ﬁeld on a 32-bit

processor), we ﬁrst split the ﬁeld into two 32-bit halves.

The details of how objects are copied and how mutators access

objects to be moved is described in Sections 3.2-3.3. Section 3.4

describes an extension of the basic mechanism that allows mutators

to perform atomic operations (e.g., CAS) on objects while they are

being moved.

3.1 The challenge

A reader who has not previously dealt with real-time or lock-free

collectors may wonder why it is difﬁcult to construct a collector

that supports lock-free programs. We illustrate the problem by dis-

cussing a generic real-time collector, similar to the ones proposed

by Nettles and O’Toole [27], Cheng and Blelloch [7, 10], and Hud-

son and Moss [21]. The basic idea is to create and maintain two

copies of each object. The fresh new copy is created by the collector

and thereafter, any application thread is responsible for executing

writes to both the original and the replicated copy. The main copy

(used for reading the current values) is the original object. Once all

We use ‘payload’ to refer to objects ﬁelds not added by CoCo, such as the

forwarding pointer word, or the status ﬁelds in the wide object.

objects have an updated replica, the copying phase terminates by

stopping all program threads and modifying their root set pointers

to point to the copied objects.

The main problem with this solution is that the two copies of an

object are not guaranteed to contain the same information, unless

proper locking mechanisms are introduced. Suppose two threads

try to concurrently modify a ﬁeld f, which originally holds the

value 0. Thread T 1 tries to write the value 1 into f and Thread

T 2 tries to write the value 2. Although they attempt to write to

the ﬁeld concurrently, one of the writes will happen before the

other. Assume that T 1 writes ﬁrst. A third thread that reads this

ﬁeld may see the ﬁeld value going from 0 to 1 and then to 2.

However, threads T 1 and T 2 next concurrently attempt to write

to the replica, possibly happening in a different order, making 1

be the value that prevails in the replica. A third thread that reads

the ﬁeld in the original location and then in the copied location

may observe the sequence of values 0, 1, 2, 1 in the ﬁeld f . Such a

sequence should never be observed by any thread according to any

reasonable memory model. To solve this, previous work employed

locking or assumed that there were no concurrent (non-blocking)

writes to a memory location. However, non-blocking concurrent

accesses are essential for any lock-free real-time algorithm.

A second problem is that in the generic algorithm, the threads

are all halted simultaneously to shift from the original copy to

the replica. This also involves some undesirable locking mecha-

nism, making it possible for one slow thread to block others. If the

threads are not stopped simultaneously, then they may be in dif-

ferent stages, where some of them are still reading the old replica,

whereas others are not writing to it anymore. Various other haz-

ardous races exist.

Past solutions include disallowing simultaneous writes [21]; or

(inefﬁciently) creating a full copy of the object for each modiﬁ-

cation [20]; limiting the run to a uniprocessor or changing the ac-

cessed copy while the program threads halt [3]. Collectors that han-

dle concurrent compaction as the application executes concurrently

[11, 23] employ virtual memory (page protection) to simultane-

ously block access to stale data. The problem with these collectors

is that they induce a trap storm whose duration is tens of millisec-

onds and during which the program is practically halted. CoCo’s

responsiveness is three orders of magnitude better (less than tens of

microseconds).

CoCo does not need to stop the threads simultaneously. It also

does not rely on locking to keep the replicas coherent. The main

idea is to create a temporary wide object in which each ﬁeld is as-

sociated with a status word. The status changes atomically with the

data and indicates the current location of the data values. The use

of this temporary wide object and incremental object copying, in

which the transfer of data location happens ﬁeld by ﬁeld, provides

the high responsiveness of CoCo. The algorithm is described in the

following subsections.

3.2 The object copying mechanism

In what follows, we assume that CoCo runs on a single thread con-

currently with the program. An extension to several CoCo threads

is discussed in Subsection 3.5 below. CoCo traverses the objects

that need to be copied one by one in any order. The ﬁrst step of

copying an object is to create an uninitialized wide version of the

object, as discussed earlier. Objects contain a special header word

used as a forwarding pointer for CoCo during the compaction. In

the ﬁrst phase, the forwarding pointer will store a reference to the

wide object. Later it will point to the new copy of the object.

CoCo copies each payload ﬁeld from the original object into

the wide object. At the same time, the mutator may modify the

wide object (modiﬁcations to the original are no longer allowed

after a wide-object pointer has been installed to the header of the

剩余10页未读，继续阅读

weixin_38663036

粉丝: 4
资源: 929

STOPLESS：实时多处理器垃圾收集器

Memory Efficient Hard Real-Time Garbage Collection by Tobias Ritzau.pdf

IBM Pattern Modeling and Analysis Tool for Java Garbage Collector

garbage collector lab mark and sweep

jvm参数英文联想记忆

C#释放内存Garbage Collector代码

pygame垃圾分类,从我做起

django 导入分类模型实现垃圾分类

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

Sqoop GC overhead limit exceeded

https://github.com/yaohaozhe/computer-vision-based-on-deep-learning-garbage-classification

pygame 垃圾分类小游戏

java.sql.SQLException: GC overhead limit exceeded

com.alibaba.excel.exception.ExcelGenerateException: java.lang.OutOfMemoryError: GC overhead limit exceeded

use python to programing a garbage classification program

hive监控指标大全含英文名称

java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_init

java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_max

-Dsun.rmi.dgc.server.gcInterval

* Free up resources used by the codec instance. * * Make sure you call this when you're done to free up any opened * component instance instead of relying on the garbage collector * to do this for you at some point in the future. */

最新资源