
384 O. SHALEV AND N. SHAVIT
perform as well as Lea’s algorithms, even in nonmultiprogrammed cases, although
lock-free algorithms are expected to benefits systems mainly in multiprogrammed
environments. Under high loads, they significantly outperform Lea’s algorithm,
exhibiting up to four times higher throughput. They also exhibit greater robustness,
for example in experiments where the hash function is biased to create nonuniform
distributions.
The remainder of this article is organized as follows: In the next section, we
describe the background and the new algorithm in depth. In Section 3, we present
the full correctness proof. In Section 4, the empirical results are presented and
discussed.
2. The Algorithm in Detail
Our hash table data structure consists of two interconnected substructures (see
Figure 1): A linked list of nodes containing the stored items and keys, and an
expanding array of pointers into the list. The array entries are the logical “buckets”
typical of most hash tables. Any item in the hash table can be reached by traversing
down the list from its head, while the bucket pointers provide shortcuts into the list
in order to minimize the search cost per item.
The main difficulty in maintaining this structure is in managing the continuous
coverage of the full length of the list by bucket pointers as the number of items in
the list grows. The distribution of bucket pointers among the list items must remain
dense enough to allow constant time access to any item. Therefore, new buckets
need to be created and assigned to sparsely covered regions in the list.
The bucket array initially has size 2, and is doubled every time the number of
items in the table exceeds size · L, where L is a small integer denoting the load
factor, the maximum number of items one would expect to find in each logical
bucket of the hash table. The initial state of all buckets is uninitialized, except
for the bucket of index 0, which points to an empty list, and is effectively the
head pointer of the main list structure. Each bucket goes through an initialization
procedure when first accessed, after which it points to some node in the list.
When an item of key k is inserted, deleted, or searched for in the table, a hash
function modulo the table size is used, that is, the bucket chosen for item k is
k mod size. The table size is always equal to some power 2
i
, i ≥ 1, so that the
bucket index is exactly the integer represented by the key’s i least significant bits
(LSBs). The hash function’s dependency on the table size makes it necessary to
take special care as this size changes: an item that was inserted to the hash table’s
list before the resize must be accessible, after the resize, from both the buckets it
already belonged to and from the new bucket it will logically belong to given the
new hash function.
2.1. R
ECURSIVE SPLIT-ORDERING. The combination of a modulo-size hash
function and a 2
i
table size is not new. It was the basis of the well known se-
quential extensible Linear Hashing scheme proposed by Litwin [1980], was the
basis of the two-level locking hash scheme of Ellis [1983], and was recently used
by Lea [2003] in his concurrent extensible hashing scheme. The novelty here is
that we use it as a basis for a combinatorial structure that allows us to repeatedly
“split” all the items among the buckets without actually changing their position in
the main list.