A Locality-Improving Dynamic Memory Allocator
Yi Feng and Emery D. Berger
Department of Computer Science
University of Massachusetts Amherst
140 Governors Drive
Amherst, MA 01002
{yifeng, emery}@cs.umass.edu
ABSTRACT
Because most application data is dynamically allocated, the mem-
ory manager plays a crucial role in application performance by de-
termining the spatial locality of heap objects. Previous general-
purpose allocators have focused on reducing fragmentation, while
most locality-improving allocators have either focused on improv-
ing the locality of the allocator (not the application) or required in-
formation supplied by the programmer or obtained by profiling. We
present a high-performance memory allocator that builds on pre-
vious allocator designs to achieve low fragmentation while trans-
parently improving application locality. Our allocator, called Vam,
improves page-level locality by managing the heap in page-sized
chunks and aggressively giving up free pages to the virtual mem-
ory manager. By eliminating object headers, using fine-grained
size classes, and by allocating objects using a reap-based algorithm,
Vam improves cache-level locality. Over a range of large footprint
benchmarks, Vam improves application performance by an average
of 4%–8% versus the Lea (Linux) and FreeBSD allocators. When
memory is scarce, Vam improves application performance by up to
2X compared to the FreeBSD allocator, and by over 10X compared
to the Lea allocator. We show that synergy between Vam’s layout
algorithms and the Linux swap clustering algorithm increases its
swap prefetchability, further improving its performance when pag-
ing.
1. Introduction
Explicit memory managers have traditionally focused on address-
ing the problem of fragmentation, discontiguous free chunks of
memory. Reducing fragmentation improves space efficiency and
understandably has received considerable attention by memory man-
ager designers. For example, the widely-used Lea allocator that
forms the basis of the Linux malloc (DLmalloc) was designed
specifically for high performance and low fragmentation [15, 16,
19].
This material is based upon work supported by the National Science Foun-
dation under CAREER Award CNS-0347339. Any opinions, findings, and
conclusions or recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the National Science
Foundation.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Submitted to MSP 2005 Chicago, IL USA
Copyright 2005 ACM 0-12345-67-8/90/01 ..$5.00
However, the widely-acknowledged increasing latency gap be-
tween the CPU and the various levels of the memory hierarchy
(caches, RAM, and disk) makes improving data locality a first-level
concern. For most applications, this means improving the locality
of the heap. While applications typically exhibit temporal locality,
spatial locality is dictated by the memory allocator, which deter-
mines where and how to lay out the application’s dynamic data.
This allocator-controlled locality can have a significant impact on
the application’s overall performance.
In this paper, we present a new general-purpose memory allo-
cator called Vam that improves data locality while providing low
fragmentation. Vam improves page-level locality by managing the
heap in page-sized chunks and aggressively giving up free pages to
the virtual memory manager. By eliminating object headers, using
a judicious selection of size classes, and by allocating objects using
a reap-based algorithm [9], Vam improves cache-level locality.
We compare Vam to the low-fragmentation Linux allocator (DL-
malloc) and to the page-level locality-improving FreeBSD alloca-
tor (PHKmalloc) [17], both of which we describe in detail. To our
knowledge, PHKmalloc has not been discussed previously in the
memory management literature. We build on these algorithms, in-
corporating their best features while removing most of their disad-
vantages.
Our experiments on a suite of memory-intensive benchmarks
show that Vam consistently achieves the best performance. Vam
performs on average 8% faster than DLmalloc and 4% faster than
PHKmalloc when there is sufficient physical memory to avoid pag-
ing. When physical memory is scarce, Vam outperforms these al-
locators by over 10X and up to 2X, respectively. We show that part
of this improvement is due to an unintended but fortunate synergy
between Vam and the way Linux manages swap space, which holds
evicted pages on disk. We call this phenomenon swap prefetchabil-
ity and show that it leads to improved performance when paging.
2. Related Work
There has been extensive research on dynamic memory allocation.
In their well-known survey paper, Wilson et al. devote most of
their attention to the question of fragmentation, which they identify
as the most important metric for evaluating memory allocators [24].
Johnstone and Wilson in their subsequent studies evaluate a wide
range of allocation policies using actual C/C++ programs and ar-
gue that fragmentation is near zero, given a good choice of alloca-
tion policy [15, 16]. While they argue that reducing fragmentation
generally improves locality, we show that Vam’s approach is more
effective.
Most previous researchers have attacked the problem of local-
ity in memory allocation either by improving the locality of the
allocator itself or by using extra information such as programmer
hints or profiles to guide placement decisions. Grunwald and Zorn