empty
empty
R0 R1 R2 R3
RemSet1
R0 R1 R2 R3
RemSet3
(a)
(b)
GC
a
b
c
c
b
a
Figure 1: Remembered Set Operation
If a multithreaded program is running on a multiprocessor
machine, using a sequential garbage collector can create a
performance bottleneck. We therefore strive to parallelize
the operations of an evacuation pause as much as possible.
The first step of an evacuation pause is to sequentially
choose the collection set (section 3 details the mechanisms
and heuristics of this choice). Next, the main parallel phase
of the evacuation pause starts. GC threads compete to claim
tasks such as scanning pending log buffers to update remem-
bered sets, scanning remembered sets and other root groups
for live objects, and evacuating the live objects. There is no
explicit synchronization among tasks other than ensuring
that each task is performed by only one thread.
The evacuation algorithm is similar to the one reported by
Flood et al. [18]. To achieve fast parallel allocation we use
GCLABs, i.e. thread-local GC allocation buffers (similar
to mutator TLABs). Threads allocate an object copy in
their GCLAB and compete to install a forwarding pointer
in the old image. The winner is responsible for copying
the object and scanning its contents. A technique based on
work-stealing [1] provides load balancing.
Figure 1 illustrates the operation of an evacuation pause.
Step A shows the remembered set of a collection set region
R1 being used to find pointers into the collection set. As will
be discussed in section 2.6, pointers from objects identified
as garbage via concurrent marking (object a in the figure)
are not followed.
2.4 Generational Garbage-First
Generational garbage collection [34, 26] has several ad-
vantages, which a collection strategy ignores at its peril.
Newly allocated objects are usually more likely to become
garbage than older objects, and newly allocated objects are
also more likely to be the target of pointer modifications,
if only because of initialization. We can take advantage of
both of these properties in Garbage-First in a flexible way.
We can heuristically designate a region as young when it is
chosen as a mutator allocation region. This commits the re-
gion to be a member of the next collection set. In return for
this loss of heuristic flexibility, we gain an important benefit:
remembered set processing is not required to consider mod-
ifications in young regions. Reachable young objects will be
scanned after they are evacuated as a normal part of the
next evacuation pause.
Note that a collection set can contain a mix of young and
non-young regions. Other than the special treatment for
remembered sets described above, both kinds of regions are
treated uniformly.
Garbage-First runs in two modes: generational and pure
garbage-first. Generational mode is the default, and is used
for all performance results in this paper.
There are two further “submodes” of generational mode:
evacuation pauses can be fully or partially young. A fully-
young pause adds all (and only) the allocated young regions
to the collection set. A partially-young pause chooses all
the allocated young regions, and may add further non-young
regions, as pause times allow (see section 3.2.1).
2.5 Concurrent Marking
Concurrent marking is an important component of the
system. It provides collector completeness without impos-
ing any order on region choice for collection sets (as, for ex-
ample, the Train algorithm of Hudson and Moss [22] does).
Further, it provides the live data information that allows
regions to be collected in “garbage-first” order. This section
describes our concurrent marking algorithm.
We use a form of snapshot-at-the-beginning concurrent
marking [36]. In this style, marking is guaranteed to iden-
tify garbage objects that exist at the start of marking, by
marking a logical “snapshot” of the object graph existing at
that point. Objects allocated during marking are necessar-
ily considered live. But while such objects must be consid-
ered marked, they need not be traced: they are not part of
the object graph that exists at the start of marking. This
greatly decreases concurrent marking costs, especially in a
system like Garbage-First that has no physically separate
young generation treated specially by marking.
2.5.1 Marking Data Structures
We maintain two marking bitmaps, labeled previous and
next. The previous marking bitmap is the last bitmap in
which marking has been completed. The next marking bitmap
may be under construction. The two physical bitmaps swap
logical roles as marking is completed. Each bitmap contains
one bit for each address that can be the start of an ob-
ject. With the default 8-byte object alignment, this means
1 bitmap bit for every 64 heap bits. We use a mark stack
to hold (some of) the gray (marked but not yet recursively
scanned) objects.
2.5.2 Initial Marking Pause/Concurrent Marking
The first phase of a marking cycle clears the next marking
bitmap. This is performed concurrently. Next, the initial
marking pause stops all mutator threads, and marks all ob-
jects directly reachable from the roots (in the generational
mode, initial marking is in fact piggy-backed on a fully-
young evacuation pause). Each heap region contains two
top at mark start (TAMS) variables, one for the previous
marking and one for the next. We will refer to these as
the previous and next TAMS variables. These variables are
used to identify objects allocated during a marking phase.
These objects above a TAMS value are considered implic-
itly marked with respect to the marking to which the TAMS
variable corresponds, but allocation is not slowed down by
marking bitmap updates. The initial marking pause iterates
over all the regions in the heap, copying the current value of
top in each region to the next TAMS of that region. Steps
A and D of figure 2 illustrate this. Steps B and E of this