manifests the complexities of both a standard copying nursery and
a reference counted heap.
Age-Oriented Paz et al. [21] introduced age oriented collection,
which aimed to exploit the generational hypothesis that most ob-
jects die young. Their age-oriented collector uses a reference count-
ing collection for the old generation and a tracing collection for the
young generation that establishes reference counts during tracing.
This provides a significant benefit as it avoids performing expen-
sive reference counting operations for the many young objects that
die. Like ulterior reference counting, this collector is a hybrid, so
manifests the complexities of two orthodox collectors.
2.3 Collecting Cyclic Objects
As discussed above, reference counting alone cannot collect all
garbage. Objects can form a cycle, where a group of objects point to
each other, maintaining non-zero reference counts. There exist two
general approaches to deal with cyclic garbage: backup tracing [22]
and trial deletion [2, 9, 18, 19]. Frampton [13] conducted a detailed
study of cycle collection.
Backup Tracing Backup tracing performs a mark-sweep style
trace of the entire heap to eliminate cyclic garbage. The only key
difference to a classical mark sweep is that during the sweep phase,
decrements must be performed from objects found to be garbage for
their descendants into the live part of the heap. To support backup
tracing each object needs to be able to store a mark state during
tracing. Backup tracing can also be used to restore stuck reference
counts as described in Section 2.1.
Trial Deletion Trial deletion collects cycles by identifying groups
of self-sustaining objects using a partial trace of the heap in three
phases. In the first phase, the sub-graph rooted from a selected
candidate object is traversed, with reference counts for all outgoing
pointers (temporarily) decremented. Once this process is complete,
reference counts reflect only external references into the sub-graph.
If any object’s reference count is zero then that object is only
reachable from within the sub-graph. In the second phase, the sub-
graph is traversed again, and outgoing references are incremented
from each object whose reference count did not drop to zero.
Finally, the third phase traverses the sub-graph again, sweeping
all objects that still have a reference count of zero. The original
implementation was due to Christopher [9] and has been optimized
over time [2, 18, 19].
Cycle collection is not the focus of this paper, however some
form of cycle collection is essential for completeness. We use
backup tracing, which performs substantially better than trial dele-
tion and has more predictable performance characteristics [13].
Backup tracing also provides a solution to the problem of reference
counts that become stuck due to limited bits.
3. Analysis of Reference Counting Intrinsics
Recall that despite the implementation advantages of simple imme-
diate reference counting, reference counting is rarely used because
it is comprehensively outperformed by tracing collectors. To help
understand the sources of overhead and identify opportunities for
improvement, we now study the behavior of standard benchmarks
with respect to operations that are intrinsic to reference counting.
In particular, we focus on metrics that are neither user-controllable
nor implementation-specific.
3.1 Methodology
We instrument Jikes RVM to identify, record, and report statistics
for every object allocated. We control the effect of cycle collection
by performing measurements with cycle collection policies at both
extremes (always collect cycles vs. never collect cycles) and report
when this affects the analysis.
Jikes RVM We use Jikes RVM and MMTk for all experiments.
Jikes RVM [1] is a high performance research JVM with a well-
tuned garbage collection infrastructure MMTk [7]. Jikes RVM is
open source written almost entirely in a slightly extended Java.
Jikes RVM does not have a bytecode interpreter. Instead, a fast
template-driven baseline compiler produces machine code when
the VM first encounters each Java method. To ensure performance
and repeatability, all of our experiments were run using Jikes
RVM’s replay compilation feature. We use the most recent version,
which executes one iteration of each benchmark using only the
unoptimized baseline compiler, before using user-provided profile
information to optimize hot methods all at once, prior to the sec-
ond iteration. The second iteration of the benchmark then executes
this optimized version of the code. This approach offers the per-
formance of steady state in an adaptively optimized system, whilst
avoiding the non-determinism of adaptive compilation.
MMTk MMTk is Jikes RVM’s memory management sub-system.
It is a programmable memory management toolkit that implements
a wide variety of collectors that reuse shared components [6]. To
perform our analysis, we instrument the standard configuration of
reference counting to gather information on different metrics while
running the benchmarks. This instrumentation does not affect the
garbage collection workload (the exact same set of objects is col-
lected with or without the instrumentation). The instrumentation
slows the collector down considerably, but since this part of our
analysis is not concerned with collector performance, this slow-
down is irrelevant. We do not use the instrumentation for our sub-
sequent performance study. All of the collectors we evaluate are
parallel, including the standard reference counting we use as our
baseline. The optimizations we present here are correct with re-
spect to parallel collection and the results we present here exploit
parallel collection.
We use mark-sweep as our representative tracing collector and
principal point of comparison because it utilizes the same heap or-
ganization and allocator as the reference counters. We compare our
best reference counting system with the high performance Immix
tracing collector [5], but this is not our main point of comparison
because the principal advantage of the Immix collector is its unique
heap organization which is orthogonal to our optimizations.
Benchmarks We use 19 benchmarks from the DaCapo and
SPEC benchmark suites in all the measurements and performance
studies taken in this paper. SPEC provides both Java client and
server side benchmarks. The DaCapo suite [8] is a suite of non-
trivial real-world open source Java applications. We use the super-
set of all benchmarks from DaCapo 2006 and DaCapo 9.12 that
can run successfully with Jikes RVM, using the more recent version
of any given benchmark when the opportunity exists. We identified
a nominal minimum heap size for each benchmark by finding the
minimum heap size in which the benchmark could successfully
complete using any of the three systems we evaluate (standard
reference counting, our optimized reference counting, and mark-
sweep). Unless otherwise stated we conduct all of our performance
experiments while holding the heap size constant at 2× the mini-
mum heap size, which is a modest size.
Experimental Platform The results we present here were mea-
sured on a modern Core i5 670 dual-core processor with two-way
SMT, a clock rate of 3.4 GHz, 4 MB of last level cache, and 4 GB
of RAM. We conducted our evaluation on a range of modern and
older x86 processors and found that our analysis and optimizations
are robust.
3