Cork: Dynamic Memory Leak Detection for Java
Maria Jump and Kathryn S. McKinley
Technical Report TR-06-07
Department of Computer Sciences
The University of Texas at Austin
Austin, TX, 78712, USA
f
mjump,mckinley
g
@cs.utexas.edu
Abstract
Despite all the benefits of garbage collection, memory leaks remain
a problem for Java programs. A memory leak in Java occurs when
a program inadvertently maintains references to objects that it no
longer needs, preventing the garbage collector from reclaiming
space. At best, leaks degrade performance. At worst, they cause
programs to run out of memory and crash. Small continuous leaks
in long-running programs are notoriously hard to find and can crash
the program only after days or weeks of execution.
We introduce Cork, a low-overhead, accurate technique for de-
tecting memory leaks in Java programs. Cork identifies overall
monotonic heap growth by piggybacking on the garbage collector.
On each full-heap collection, Cork builds a summary type points-
to graph annotated with type volumes. Cork identifies potentially
leaking types that grow over multiple collections. Cork reports the
slice in the type points-to graph that is growing (i.e., the data struc-
ture that points to the leaking type). We implement Cork in MMTk
for Jikes RVM, where it adds an average overhead of 2.4% for mod-
erate heap sizes and 1.7% for large heap sizes to SPECjvm and
DaCapo benchmarks using a generational mark-sweep collector.
Cork exactly identifies a single growing data structure in each of
three popular benchmarks (fop,
202 jess, and SPECjbb2000).
Due to the precision of Cork’s report, we eliminated these leaks in
202 jess and SPECjbb2000, whereas their developers had not
previously done so. Cork is the first tool to find leaks in Java with
low enough overhead to consider using online.
1. Introduction
Memory-related bugs are a substantial source of errors, but are es-
pecially problematic for languages with explicit memory manage-
ment such as C and C++. For these languages, memory-related er-
rors include (1) dereferencing a pointer to memory that the program
previously freed (dangling pointer), (2) losing a pointer to an object
This work is supported by NSF CCR-0311829, NSF ITR CCR-0085792,
NSF CCR-0311829, NSF CISE infrastructure grant EIA-0303609, DARPA
F33615-03-C-4106, and IBM. Any opinions, findings and conclusions ex-
pressed herein are those of the authors and do not necessarily reflect those
of the sponsors.
Technical Report TR-06-07
January 2006
Copyright
c
2006 Department of Computer Sciences, University of Texas at Austin.
that the program neglects to free (lost pointer), and (3) keeping a
pointer to an object the program will never use again (unnecessary
reference).
Garbage-collected languages solve the first two memory errors,
but not the last. The garbage collector eliminates the dangling
pointer error since a pointer to an object prevents the collector from
reclaiming it. Additionally, the collector eliminates those memory
leaks caused lost pointers since it reclaims objects that do not have
pointers to them. Unfortunately, garbage collection is conservative
and therefore cannot detect, much less reclaim, memory referred
to by unnecessary references. Thus, a memory leak in a garbage-
collected language occurs when a program inadvertently maintains
references to objects that it no longer needs, preventing the garbage
collector from reclaiming space.
In the best case, unnecessary references to individual objects
simply degrade program performance by increasing its memory
requirements and consequently the collector workload. In the worst
case, unnecessary references refer to a growing data structure, parts
of which are no longer in use. These types of leaks can eventually
cause the program to run out of memory and crash. In long-running
programs, such as server applications, small leaks can take days or
weeks to manifest making these bugs notoriously difficult to find.
Heap-occupancy graphs [18, 24] reveal the underlying prob-
lem of systematic heap growth, but not the solution. A heap oc-
cupancy graph plots the total heap occupancy (y-axis) over time
(x-axis) measured in allocation by collecting the entire heap very
frequently (every 10K of allocation in our graphs). Figure 1 shows
the heap occupancy graphs of
213 javac from SPECjvm and
SPECjbb2000. The graph for
213 javac shows four program al-
location phases that reach the same general peaks which indicates
213 javac uses about the same maximum amount of memory in
each phase and no phase leaks memory to the next. There is no leak.
On the other hand, SPECjbb2000 running one warehouse for long
periods of time shows memory requirements continue to grow until
the end of execution. Allowed to run for days, it would run out of
memory and crash. There is a leak. Although these graphs reveal
potential leaks, they do not pinpoint the source of the leak.
Previous approaches to finding memory leaks use heap diagno-
sis tools that rely on a combination of heap differencing [11, 12, 13,
19, 20] and allocation and/or fine-grain usage tracking [9, 10, 15,
16, 21, 25, 26] which makes them very expensive. These techniques
tend to yield large amounts of low-level details about individual ob-
jects that require a lot of time and expertise to interpret.
To address these shortcomings, this paper introduces Cork, a
low-overhead, accurate technique for detecting potential memory
leaks in Java programs. Cork identifies overall monotonic heap
growth and reports the data structure(s) that generates it to the user.
Cork piggybacks on full-heap garbage collection to compute this
information. As the garbage collector scans the heap, Cork builds a