A Case Study in Optimizing HTM-Enabled
Dynamic Data Structures: Patricia Tries
Thomas J. Repetti Maurice P. Herlihy
Department of Computer Science
Brown University
{trep etti,mph}@cs.brown.edu
Abstract
The advent of multi-core microprocessors with restricted transac-
tional memory (RTM) and accompanying compiler support allows
us to revisit fundamental data structures with an eye to extracting
more parallelism. The Patricia trie is one such common data struc-
ture used for storing both sets and dictionaries in a variety of con-
texts. This paper presents a concurrent implementation of a dynam-
ically sized Patricia trie using a lock teleportation RTM fast path for
find, add and remove operations, and a slow path based on atomic
exchange spinlocks. We weigh the tradeoffs between alphabet size
and tree depth inherent to tries and propose a novel means of deter-
mining the optimal number of retry attempts for specific operations
on dynamically allocated data structures. The strategy proposed
separates the retry policy governing operations that potentially in-
teract with the operating system’s memory management facilities
from read-only operations, and we find that this transactional trie
can support considerably higher multiprogramming levels than its
lock-based equivalent. A notable result is that this scheme keeps
throughput from collapsing at high thread counts, even when the
number of threads interacting with the data structure exceeds the
number of hardware contexts available on the system.
Keywords Patricia trie, symmetric multiprocessing, concurrent
data structure, hardware transactional memory, restricted transac-
tional memory
1. Introduction
1.1 Transactional Memory
Transactional memory is a synchronization paradigm, which ef-
fectively exends the atomicity of traditional atomic shared mem-
ory operations like compare-and-swap or fetch-and-add to gener-
alized read-modify-write operations on arbitrary regions of mem-
ory [12]. Although it was originally conceived as an architectural
feature to extend cache coherency protocols in hardware, until re-
cently all implementations were strictly in software [29]. The com-
posability of speculative regions of software execution alone is a
benefit to the productivity of programmers writing concurrent soft-
ware, but before the widespread commercial availability of hard-
ware with transactional memory support, the full performance ben-
efits of the technique could not be brought to bear. Currently avail-
able commercial hardware such as Intel’s Haswell processors [14]
and POWER8 architecture systems like IBM Blue Gene/Q [11] and
System z [16] all support best-effort hardware transactional mem-
ory, meaning there are no guarantees of forward progress, and a
transaction may abort for any reason, the cause of which may be
opaque to the programmer. Avni and Kuszmaul preface [1] with
a good summary of the variety of issues that may trigger an abort
for unspecified reasons under Intel’s Transactional Synchronization
Extensions (TSX) RTM. The reasons for transactional aborts under
the TSX scheme that do not have an cause visible to the program-
mer may include cache misses, TLB misses and interrupts. For
these reasons, it is common to use pre-allocation strategies when
investigating data structures under HTM in order to avoid inter-
ference from the operating system. The implementation presented
here, however, uses dynamic memory allocation at runtime as one
would expect from a normal data structure in the field.
1.2 Tries
Tries [9] are tree data structures used to store a set of arbitrary
length keys. The root of such a tree is a node corresponding to
the null string key. Each node, including the root, has a number
of possible children determined by the number of characters in the
alphabet from which strings are composed. A string present in the
set will have a succession of non-null pointers to character nodes
starting at the root which match each character in its sequence. This
assumes the presence of a string termination signifier such as the
“