2 Overview of Techniques
Viewing buckets as bins and items as balls, we can look at the hashing process as if m balls are
being assigned to n bins. For each ball two bins are chosen at random. If the bins are imagined
to be the vertices of a graph, the two bins for a ball can be represented by an edge. This gives
us a random graph G on n ver tices containing m edges. By making this graph directed, we could
use the direction of an edge to indicate the choice of the bin among the two for placing the ball.
The direction of each edge is chosen online by a certain procedure. The load of a vertex (bucket) is
equal to its in-degree. For each edge (item) insertion, the two-way h ash algorithm directs the edge
towards the vertex with th e lower in-degree. During the hash process, say U is on e of the vertices
a b all gets hashed to. Observe that if V U is a directed ed ge, and if the load on V is significantly
lower, we could perform a move from U to V , thus freeing up a position in U. Es sentially, in terms
of load, the new ball could be added to either U or V , whichever has a lower load. This p rinciple
could be generalized to the case where there is a directed path from V to U , and would result in
performing moves and flipping the directions along all the edges on the path. If there is a directed
sub-tree rooted at U , with all edges leading to the r oot, we could choose the least loaded vertex in
this tree to incur the load of the new ball. With this understanding, we will say that W is a child
of X if XW is a directed edge. So, our hash insert algorithm looks as follows.
• Compute the two bins U
1
and U
2
that the new item to be inserted hashes to.
• Explore vertices that can be reached from U
1
or U
2
by traversing along directed edges in the
reverse direction.
• Among such vertices, find one, V , with low load that can be reached s ay from U
1
.
• Add the new item to U
1
and perform moves along the path from U
1
to V so that only the
load on V increases by one.
Let s = 2m/n denote the average degree of the und irected random graph G. Note that the
same graph G can be viewed as a directed or an undirected graph. Throughout the paper G refers
to the undirected version unless stated otherwise or clear to be so from the context. Throughout
the paper we will assume that s is a constant. It turns out that the success of our algorithm in
maintaining low maximum load depends on the absence of dense sub grap hs in this random graph.
We show that such dense subgraphs are absent when s < 3.35, giving an algorithm that works with
bucket size at m ost 2 and requiring at most log log n + O(1) moves for inserts with high probability
(section 3). Note that the bound of 3.35 for s may not tight but is provably no more than 3.72. We
then analyze the trade off between number of moves during inserts and maximum bucket size using
the technique of witness trees [5] [16] [2], making significant adaptations to our problem (section
4).
3 Constant Maximum Bucket Size
In this section we show that for s < 3.35 by perf orming at most log log n + O(1) moves, we can
ensure that with high probability no bucket gets more than 2 items.
For an insert, we search backwards fr om a given node in bfs order, traversing directed edges in
reverse direction, looking for a node with load at most one. To simplify the analysis, we assume
that during the backward search, the algorithm visits only 2 children for each node even if more
may be present. We will show that by searching to a depth of log log n+O (1), with high prob ab ility,
we find a node with load at most one. First, we show th at if the backward search is allowed to
3