2 1. INTRODUCTION
order, then a deadlock can occur. One missing lock acquisition can lead to race conditions, which are
very difficult to debug. Indeed, concurrency-related programming errors often manifest themselves
only in some, possibly rare, executions, and sometimes only when the application is run with all its
debugging functions turned off. Those errors can result in effects ranging from program crashes,
which are directly visible, up to very subtle data corruption, which may pass undetected for a long
time.
The problem with following a complex locking policy in a big application is that this policy
exists only in the documentation (e.g.,comments in the code) or in the programmers’minds.Looking
just at the code of a program, written in one of the mainstream programming languages such as
Java or C++, one typically cannot tell which lock protects which objects, or in which order those
locks should be acquired. Associating a monitor with each object, as in Java, does not really help in
fine-grained locking when a single monitor or lock can protect a group of objects and, conversely,
a single object can be protected by more than one lock. It thus takes a very disciplined team of
programmers to do fine-grained locking right.
But even following a locking policy to the letter does not guarantee success in terms of perfor-
mance on multi-core systems. Fine-grained locking involves many trade-offs. For instance, locks are
not given for free, and acquiring them takes time. Hence, it might be hard to find the right balance
between using too many locks, which can result in high locking overhead, and too few locks, which
can lead to scalability problems.To give a concrete example, devising an efficient, scalable algorithm
that implements a thread-safe queue, a data structure almost trivial in a single-thread program, is so
hard that it deserved at least two papers in top computer science conferences [Herlihy et al., 2003b;
Michael and Scott, 1996]. We cannot expect that ordinary programmers will spend so much effort
in getting the last bit of parallelism from every part of their application, given how hard it is to do
it right.
O bviously, modern programming languages offer libraries containing highly-optimized and
thread-safe implementations of commonly used data structures. In many cases, it suffices to use
those components to get a program that can make use of a few CPU cores. But here we face a
problem with composability: a data structure composed of thread-safe objects is not necessarily itself
thread-safe. For example, consider two thread-safe sets, and a problem of removing an element from
one set and then adding it to the other. Often, one wants such a composite operation involving both
sets to be atomic. That is, the intermediate step of this operation, in which an element is removed
from one of the sets but still not added to the other, should not be observed by concurrent threads.
But implementing such an operation requires either adding additional locks, which protect both sets,
or extending the locking mechanisms used by the implementations of those sets. In the former case,
one effectively devises a ne w locking policy for the two sets, which can introduce race conditions
or scalability issues. In the latter case, one has to “open” the set implementations, understand their
locking policies,and extend them,which not only is challenging,but also breaksobject encapsulation.
Using locks explicitly to handle concurrency becomes even harder when threads operate on
different pr iority lev els, e.g., in a real-time system. If a high-priority thread wants to acquire a lock