Atomic I/O transactions have been supported in both file
systems and in storage. Generally speaking, application-
level data integrity semantics are not visible at the stor-
age firmware and therefore storage-level transactions [36]
are not suitable for protecting application data integrity. At
the operating system level, Stasis supports transactional I/O
on memory pages through UNDO logging [38]. TxOS [34]
connects its transactional memory support with file system
journaling to enable atomic storage operations. Compared
with I/O transactions, our failure-atomic msync() presents
a simple, POSIX-compatible programming interface that is
less complex and easier to use. It is worth noting that Mi-
crosoft Windows Vista introduced an atomic file transaction
mechanism (TxF) but the vendor deprecates and may dis-
continue this feature, noting “extremely limited developer
interest ... due to its complexity and various nuances” [28].
Some transactional I/O systems [34, 38] enable atomic I/O
over failures as well as concurrent accesses. Failure-atomic
msync() focuses on failure-atomicity while leaving concur-
rency management to the applications (through mutex locks
or other synchronization means).
Rio Vista [27] was an early effort that supports data con-
sistency over operating system failures on persistent mem-
ory but did not support data consistency over power failures.
RVM [37] is similar in spirit to failure-atomic msync(),
though utilizing a different interface and focusing on virtual
memory support rather than mapped files. With continu-
ing advances in non-volatile memory (NVRAM) hardware
technologies [8, 11], recent studies have proposed a new
NVRAM-based file system design [10], new data access
primitives (including Mnemosyne [44], NV-heaps [9], and
CDDS [43]), as well as fast failure recovery [30]. Unfor-
tunately, today’s NVRAM manufacturing technologies still
suffer from low space density (or high $/GB) and stabil-
ity/durability problems. Until these problems are resolved,
today’s storage hardware (mechanical disks and NAND
Flash-based solid-state drives) and system software (block-
based file systems) are likely to remain. To realize our pri-
mary objectives of ease-of-use and fast adoption, failure-
atomic msync() targets the software/hardware stacks run-
ning in today’s systems.
Supporting a persistent heap between volatile memory
and durable storage is a classic topic. Atkinson et al. pro-
posed PS-algol, a database programming model that allows
programmers to directly manipulate data structures on a
heap [2] while an underlying system properly and promptly
moves data from the heap to persistent storage [3]. O’Toole
et al. presented a replicating garbage collector that cooper-
ates with a transaction manager to provide durable, consis-
tent storage management [32]. Guerra et al. identify a consis-
tent data version in the heap through pointer chasing from a
root data unit and atomically commit each data version [20].
At a lower level of abstraction, our failure-atomic msync()
can easily implement a persistent heap with data integrity
and high efficiency but it also allows other programming
paradigms on memory-mapped data.
The belief that programmers benefit from the conve-
nience of manipulating durable data via conventional main-
memory data structures and algorithms dates back to MUL-
TICS [4], which inspired today’s memory-mapped file inter-
faces. Failure-atomic msync() retains the ergonomic bene-
fits of memory-mapped files and couples them with strong
new data-integrity guarantees.
Finally, our work is related to data center state manage-
ment systems such as Bigtable [7] and Dynamo [13] but with
different emphases. While centrally managed data centers
can impose a unified data access model and distributed co-
ordination, failure-atomic msync() enables small local ad-
justment of existing operating system support at individual
hosts, which is more suitable for the vast majority of inde-
pendent application development scenarios.
3. Interface and System Support
Failure-atomic msync() is a simple OS-supported mech-
anism that allows the application programmer to evolve
durable application data atomically, in spite of failures such
as fail-stop kernel panics and power outages. Failure-atomic
msync() guarantees that a memory-mapped file will always
either be in the state it was in immediately after the most
recent msync() (or the state it was in at the time of mmap()
if msync() has not been called).
Because its semantics lie at the high-level interface
between the operating system and applications, failure-
atomic msync() does not fundamentally depend on partic-
ular durable media (whether block device or not)—today’s
hard disks and SSDs and forthcoming non-volatile mem-
ory are compatible. Indeed, failure-atomic msync() seems
to be an ideal interface to novel mechanisms for taking
memory checkpoints almost instantaneously by versioning
multilevel-cell NVRAM [46].
In addition to having flexibility in the underlying storage
device, the concept of failure-atomic msync() allows multi-
ple implementations. Journaling, shadow copy, and soft up-
dates are all viable techniques that allow consistent updates
to a file system. In this paper, we describe our journaling-
based system support.
3.1 Interface and Semantics
The interface to failure-atomic msync() is simply the fa-
miliar mmap() and msync() system calls. In order to en-
able failure-atomic msync(), the programmer merely needs
to specify a new MAP ATOMIC flag to mmap() in addition
to any other flags needed. The programmer can access the
memory-mapped region in the customary fashion. When the
application state is deemed consistent by the programmer,
msync(MS SYNC) is called.
Two POSIX-standardized msync() flags—which are
currently no-ops in Linux—illustrate the fundamental har-