优化引用计数：重回垃圾回收舞台

39 浏览量更新于2024-08-25 收藏 404KB PDF 举报

"Down For The Count - Getting Reference Counting Back in the Ring (rc-ismm-2012)" 是一篇关于垃圾收集技术的研究论文，主要探讨了引用计数（reference counting）作为垃圾回收策略在高性能系统中的应用现状及其优化。自1960年以来，引用计数和追踪是垃圾收集的两种基本方法。引用计数因其即时性、非阻塞等特性，具有一定的吸引力。然而，尽管如此，现代高性能系统的实现几乎完全忽略了引用计数。该论文的作者，包括Rifat Shahriyar、Stephen M. Blackburn和Daniel Frampton，来自澳大利亚国立大学，他们深入研究了引用计数的原理和性能，旨在揭示其行为模式并提出改进措施。论文首先分析了引用计数的关键设计决策，并通过基准测试评估这些决策如何影响性能。通过对大量基准测试结果的分析，作者们进行了首次定量研究，这在他们所知范围内是针对引用计数的首次尝试。通过这些洞察，他们提出了一系列显著提高引用计数性能的优化策略。论文指出，一个现有的现代引用计数实现可以受益于这些优化，特别是在处理复杂数据结构和高并发场景时。这些优化可能包括更高效的计数算法、减少计数操作的开销、优化内存布局以减少冲突，以及在适当情况下结合其他垃圾收集策略，如分代收集或并发标记扫描。这篇论文对于理解引用计数的内在工作机制、识别其潜在问题以及如何通过工程手段提升其性能具有重要意义。它对系统设计者和内存管理领域的研究人员提供了宝贵的理论基础和实践指导，有助于重新审视和利用这种被忽视的垃圾收集技术。

manifests the complexities of both a standard copying nursery and

a reference counted heap.

Age-Oriented Paz et al. [21] introduced age oriented collection,

which aimed to exploit the generational hypothesis that most ob-

jects die young. Their age-oriented collector uses a reference count-

ing collection for the old generation and a tracing collection for the

young generation that establishes reference counts during tracing.

This provides a signiﬁcant beneﬁt as it avoids performing expen-

sive reference counting operations for the many young objects that

die. Like ulterior reference counting, this collector is a hybrid, so

manifests the complexities of two orthodox collectors.

2.3 Collecting Cyclic Objects

As discussed above, reference counting alone cannot collect all

garbage. Objects can form a cycle, where a group of objects point to

each other, maintaining non-zero reference counts. There exist two

general approaches to deal with cyclic garbage: backup tracing [22]

and trial deletion [2, 9, 18, 19]. Frampton [13] conducted a detailed

study of cycle collection.

Backup Tracing Backup tracing performs a mark-sweep style

trace of the entire heap to eliminate cyclic garbage. The only key

difference to a classical mark sweep is that during the sweep phase,

decrements must be performed from objects found to be garbage for

their descendants into the live part of the heap. To support backup

tracing each object needs to be able to store a mark state during

tracing. Backup tracing can also be used to restore stuck reference

counts as described in Section 2.1.

Trial Deletion Trial deletion collects cycles by identifying groups

of self-sustaining objects using a partial trace of the heap in three

phases. In the ﬁrst phase, the sub-graph rooted from a selected

candidate object is traversed, with reference counts for all outgoing

pointers (temporarily) decremented. Once this process is complete,

reference counts reﬂect only external references into the sub-graph.

If any object’s reference count is zero then that object is only

reachable from within the sub-graph. In the second phase, the sub-

graph is traversed again, and outgoing references are incremented

from each object whose reference count did not drop to zero.

Finally, the third phase traverses the sub-graph again, sweeping

all objects that still have a reference count of zero. The original

implementation was due to Christopher [9] and has been optimized

over time [2, 18, 19].

Cycle collection is not the focus of this paper, however some

form of cycle collection is essential for completeness. We use

backup tracing, which performs substantially better than trial dele-

tion and has more predictable performance characteristics [13].

Backup tracing also provides a solution to the problem of reference

counts that become stuck due to limited bits.

3. Analysis of Reference Counting Intrinsics

Recall that despite the implementation advantages of simple imme-

diate reference counting, reference counting is rarely used because

it is comprehensively outperformed by tracing collectors. To help

understand the sources of overhead and identify opportunities for

improvement, we now study the behavior of standard benchmarks

with respect to operations that are intrinsic to reference counting.

In particular, we focus on metrics that are neither user-controllable

nor implementation-speciﬁc.

3.1 Methodology

We instrument Jikes RVM to identify, record, and report statistics

for every object allocated. We control the effect of cycle collection

by performing measurements with cycle collection policies at both

extremes (always collect cycles vs. never collect cycles) and report

when this affects the analysis.

Jikes RVM We use Jikes RVM and MMTk for all experiments.

Jikes RVM [1] is a high performance research JVM with a well-

tuned garbage collection infrastructure MMTk [7]. Jikes RVM is

open source written almost entirely in a slightly extended Java.

Jikes RVM does not have a bytecode interpreter. Instead, a fast

template-driven baseline compiler produces machine code when

the VM ﬁrst encounters each Java method. To ensure performance

and repeatability, all of our experiments were run using Jikes

RVM’s replay compilation feature. We use the most recent version,

which executes one iteration of each benchmark using only the

unoptimized baseline compiler, before using user-provided proﬁle

information to optimize hot methods all at once, prior to the sec-

ond iteration. The second iteration of the benchmark then executes

this optimized version of the code. This approach offers the per-

formance of steady state in an adaptively optimized system, whilst

avoiding the non-determinism of adaptive compilation.

MMTk MMTk is Jikes RVM’s memory management sub-system.

It is a programmable memory management toolkit that implements

a wide variety of collectors that reuse shared components [6]. To

perform our analysis, we instrument the standard conﬁguration of

reference counting to gather information on different metrics while

running the benchmarks. This instrumentation does not affect the

garbage collection workload (the exact same set of objects is col-

lected with or without the instrumentation). The instrumentation

slows the collector down considerably, but since this part of our

analysis is not concerned with collector performance, this slow-

down is irrelevant. We do not use the instrumentation for our sub-

sequent performance study. All of the collectors we evaluate are

parallel, including the standard reference counting we use as our

baseline. The optimizations we present here are correct with re-

spect to parallel collection and the results we present here exploit

parallel collection.

We use mark-sweep as our representative tracing collector and

principal point of comparison because it utilizes the same heap or-

ganization and allocator as the reference counters. We compare our

best reference counting system with the high performance Immix

tracing collector [5], but this is not our main point of comparison

because the principal advantage of the Immix collector is its unique

heap organization which is orthogonal to our optimizations.

Benchmarks We use 19 benchmarks from the DaCapo and

SPEC benchmark suites in all the measurements and performance

studies taken in this paper. SPEC provides both Java client and

server side benchmarks. The DaCapo suite [8] is a suite of non-

trivial real-world open source Java applications. We use the super-

set of all benchmarks from DaCapo 2006 and DaCapo 9.12 that

can run successfully with Jikes RVM, using the more recent version

of any given benchmark when the opportunity exists. We identiﬁed

a nominal minimum heap size for each benchmark by ﬁnding the

minimum heap size in which the benchmark could successfully

complete using any of the three systems we evaluate (standard

reference counting, our optimized reference counting, and mark-

sweep). Unless otherwise stated we conduct all of our performance

experiments while holding the heap size constant at 2× the mini-

mum heap size, which is a modest size.

Experimental Platform The results we present here were mea-

sured on a modern Core i5 670 dual-core processor with two-way

SMT, a clock rate of 3.4 GHz, 4 MB of last level cache, and 4 GB

of RAM. We conducted our evaluation on a range of modern and

older x86 processors and found that our analysis and optimizations

are robust.

剩余10页未读，继续阅读

weixin_38529251

粉丝: 6
资源: 883

优化引用计数：重回垃圾回收舞台

yolov8系列--Real time people counting system in computer vis.zip

Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images

Paul mckenny Overview of Linux-Kernel Reference Counting.pdf

boosting-crowd-counting-via-multifaceted-attention-

How does the 'canoe' command count the number of files in a folder?

zero-shot-counting

Few-shot counting

最新资源