interact through a memory subsystem. The memory subsys-
tem contains one FIFO store buffer for each thread and is
protected by a global fair lock. The memory subsystem lock
is used to model atomic read-modify-write operations as be-
ing performed by a thread holding the lock. (For simplic-
ity, we use atomic operations directly throughout this paper.)
The machine also has a global clock (initially 0) readable by
the threads.
The execution of the machine proceeds in time units.
In each time unit the global clock increases by one. Then,
at most one of the following actions can be executed for
each thread T if it is valid to do so under the rules below.
(This does not mean that a valid action must be executed
for T —a scheduler decides the actions in each time unit.
Thus, despite the presence of a global clock, the execution is
asynchronous.)
The following actions are possible only when the memory
subsystem lock is unlocked or held by thread T :
1. The memory subsystem can dequeue T ’s oldest entry
from T ’s store buffer and write it to memory.
2. T may read: If T reads from an address for which a
matching write exists in its store buffer, the read returns
the newest corresponding value stored in the buffer. Oth-
erwise, the read returns the value from memory.
3. T may acquire the memory lock if it does not hold it.
4. T may release the memory lock if it holds the lock and
its store buffer is empty (if T wishes to release the lock
when its store buffer is not empty, the memory subsystem
must first empty T ’s store buffer with #1 actions).
The following are allowed at any time:
5. T can execute a fence if its store buffer is empty (simi-
larly to #4, the memory subsystem must act to empty T ’s
store buffer first).
6. T may write, enqueuing an entry to its store buffer.
7. T may read the global clock.
Bounding store buffering time In the TBTSO[∆] model
(where ∆ ≥ 1), we consider only the abstract machine ex-
ecutions in which the following property holds:
A write enqueued to a thread’s store buffer (action #6)
at global time t
0
is written to memory (action #1) at
global time t
1
≤ t
0
+ ∆.
3. TBTSO flag principle
Fence use in TSO often occurs when applying the flag prin-
ciple [18]. The flag principle says that when two threads,
T
0
and T
1
, each “raise a flag”—writing to a variable in
memory—and then “look” at the other’s flag by reading it
from memory, then at least one will see the other’s flag
raised [18]. Of course, correctly ordering “raising the flag”
to be globally visible before “looking at the other flag” re-
quires a memory fence on TSO (and TBTSO):
T
0
T
1
flag0 := 1 flag1 := 1
Flag fence fence
principle if (flag1) if (flag0)
print "saw T
1
" print "saw T
0
"
This section shows a TBTSO variant of the flag principle
that is asymmetric: it removes the fence from T
0
’s code and
shifts the responsibility of maintaining correct ordering of
T
0
’s reads and writes to T
1
, which does so using the TBTSO
∆ bound. We subsequently apply this asymmetric TBTSO
flag principle to remove the fence from the fast path of
hazard pointers (§ 4) and biased locks (§ 5).
To devise the TBTSO flag principle, we first use TBTSO’s
global time to rephrase the original flag principle: If, when
T
j
reads flag
i
at time t
j
, T
i
’s write to flag
i
does not ap-
pear in memory (i.e., is not yet globally visible), then T
i
will
necessarily see flag
j
raised when it reads flag
j
at time
t
i
> t
j
. This holds because if this happens, then T
i
did not
yet execute its fence at time t
j
, whereas T
j
’s write is already
globally visible at time t
j
because of its fence. TBTSO al-
lows us to break this symmetry by removing the fence from
T
0
and placing the responsibility of guaranteeing the above
property on T
1
, which will wait ∆ time units before reading
T
0
’s flag:
T
0
T
1
flag0 := 1 flag1 := 1
TBTSO fence
flag wait ∆ time units
principle if (flag1) if (flag0)
print "saw T
1
" print "saw T
0
"
Now, if T
0
reads flag
1
at time t
0
but T
1
’s flag write is
not yet globally visible, T
0
is still guaranteed that T
1
will
see flag
0
raised—because in this case T
1
has not yet issued
its fence at time t
0
and thus will read flag
0
at least ∆ time
units later, by which time T
0
’s write is globally visible. In
the opposite case, if T
1
reads flag
0
at time t
1
but T
0
’s write
is not yet globally visible, then T
0
has not written to its flag
before t
1
−∆. However, T
1
’s write is globally visible at t
1
−∆
since T
1
issues a fence, and so T
0
must observe flag
1
flag
raised.
4. Fence-free hazard pointers (FFHP)
This section describes fence-free hazard pointers (FFHP),
a nonblocking fence-free SMR algorithm for TBTSO. (Al-
though we build on hazard pointers [28], the ideas described
here apply equally well to Herlihy et al.’s guards [19]—an
SMR method that differs from hazard pointers only in how
removed objects are stored before being reclaimed.)
4.1 Standard hazard pointers
In the hazard pointers method, each thread maintains several
hazard pointers, hp
0
,hp
1
,. .., hp
k
, which it uses to announce
objects it is about to access. Applying hazard pointers in