There are several consequences of this invariant and balancing
strategy. (See e.g. [3,6] for full proofs.)
LEMMA 1. Consider an N-node weight-balanced tree with
constant balance parameter c.
(1) The degree of any node is Θ(c).
(2) For any node u and constant d ≤ h(u), the number of descen-
dants of u that have height at least h(u) −d is Θ(c
d
).
(3) Suppose that a node split has cost 1. Then the amortized cost to
insert into the tree is the search cost plus O(1).
(4) Suppose that splitting a node u costs Θ(c
h(u)
). Then the amor-
tized cost to insert into the tree is the search cost plus O(logN).
The shuttle tree supports inserts efficiently by using the buffers.
The buffers work in much the same way as in a BRT—an element
being inserted starts at the root, follows the appropriate root-to-
leaf path, but pauses at buffers along the way. The element only
gets “shuttled” down the tree when buffers overflow (and hence are
full enough to amortize the cost of crossing block boundaries). Our
shuttle tree differs from the BRT in that it has a linked list of buffers
associated with each child pointer (rather than a single buffer as in
the BRT). These buffers have doubly-exponentially increasing size.
To insert into a shuttle tree, start by inserting into the root node.
To insert into a node in the tree, simply find the appropriate child
pointer and insert into the corresponding linked list of buffers by in-
serting into the smallest buffer. When a buffer “overflows” (i.e., the
height of the buffer shuttle tree exceeds the to-be-specified maxi-
mum), take each element in the buffer and insert it into the next
(larger) buffer in the list.
3
Once the largest buffer in the list over-
flows, insert these items into the child node in the shuttle tree.
(Thus, data items in the shuttle tree live in two possible places,
either in some buffer on a root-to-leaf path or in a leaf of the tree.)
When an inserted element reaches a leaf in the shuttle tree, inser-
tions work in roughly the same way as in any SWBST with splits
trickling up the tree. Note that at the time a node u splits, the buffers
in between u and u’s parent have just been flushed.
LEMMA 2. When an element is inserted into a leaf ℓ, all nodes
on the path from the root to ℓ can be fetched without increasing the
asymptotic complexity, as long as M = Ω(BlogN).
Proof. The reason we insert into a leaf is because its parent’s
buffer has just overflowed, thus the grandparent buffer has just
overflowed, and so on up the tree to the root. Thus, we just flushed
buffers all the way down. If any subsequent block transfers evict
the root-to-leaf path, we can charge the cost of replacing the rele-
vant path block to the cost of evicting it in the first place.
We base our buffer sizes on Fibonacci numbers. Let F
k
be the kth
Fibonacci number. Then F
0
= 0, F
1
= 1, and F
k
= F
k−1
+F
k−2
. For
all positive integers h, we define the Fibonacci factor of h, denoted
by ξ(h), as follows. If h is a Fibonacci number, then ξ(h) = h.
Otherwise, let f be the largest Fibonacci number less than h. Then
the Fibonacci factor of h is ξ(h) = ξ(h − f ). The buffer sizes of
a node at height h + 1 depend upon ξ(h). In particular, consider a
node u at height h + 1 in the tree, and let k be such that F
k
= ξ(h).
We define the buffer-height-index function H ( j) = j −⌈2log
ϕ
j⌉,
where ϕ≈1.618 is the golden ratio. Then u has buffers with heights
F
H ( j)
, for each integer j, j = Θ(1), ...,k −1,k.
4
In other words,
there are roughly k buffers increasing geometrically in their heights,
3
These items are inserted in arrival order, not smallest to largest.
4
We can start j at any sufficiently large constant to help the proofs,
in particular Lemma 16.
and the largest buffer has height F
H (k)
= F
k−2⌈log
ϕ
k⌉
. These set-
tings mean that the parent node of a subtree containing roughly K
nodes has the largest buffer of size roughly K
1/Θ((loglogK)
2
)
.
The shuttle tree as specified thus far cannot yet be analyzed in
the cache-oblivious setting. To do so, we must enforce a particular
dynamic layout in memory. We must show that the layout permits
efficient operations and can be efficiently maintained.
Shuttle-tree layout
We lay out the shuttle tree recursively in a type of “van Emde Boas
(vEB) layout” [20] that takes into account the lists of buffers and
several additional complications.
We first explain how our vEB layout would proceed on a regular
tree of height h. Let F
k
be the largest Fibonacci number strictly
smaller than h. Then we split the tree at height F
k
(roughly h/ϕ
instead of at height h/2 as in the traditional vEB layout). That is, if
h = F
k
is the kth Fibonacci number, then we split the tree into a root
subtree of height F
k−2
and leaf subtrees of height F
k−1
, which are
recursively laid out. It is important that the split is above the half-
way point, height h/2, unlike in previous cache-oblivious search
structures [6–8, 11, 20]. Fibonacci numbers are a convenient way
to ensure this requirement because they enforce some integrality
while roughly matching F
k
≈ ϕ
k
.
We now give the vEB layout of the shuttle tree, which means
also laying out the buffers; see Figure 1. Consider a (sub)tree of
height F
k+1
, with leaves of this tree having buffers of heights (ge-
ometrically increasing) up to F
H (k+1)
. Think of this subtree and
these buffers as a single entity, which we call a recursive sub-
tree. When laying out this recursive subtree, we split the subtree
at height F
k
. In the “left” end of memory, we store the top recur-
sive subtree of height F
k−1
(which includes leaf buffers of height
up to F
H (k−1)
) recursively. To the right of this subtree, we store
the height-F
H (k)
leaf buffers, from left-to-right in the same order
as the leaves. To the right of these buffers, we lay out each of the
bottom recursive subtrees of height F
k
(including leaf buffers of
height up to F
H (k)
) recursively. To the right of each of the bottom
recursive subtrees, we lay out that subtree’s height-F
H (k+1)
leaf
buffers. We call the (contiguous) height-F
k
recursive subtree and
height-F
H (k+1)
buffers (which appear immediately after the recur-
sive subtree in the layout) a (height-F
k
) buffered recursive subtree.
The leaves of the shuttle tree are special in that they do not have
any buffers. We call a recursive subtree containing leaves of the
top level (i.e., entire) tree a leaf recursive subtree. The recursive
layout of a leaf recursive subtree is the same, except that the bot-
tom recursive subtrees do not have any buffers coming out of the
leaves. For convenience, we use the terms “recursive subtree” and
“buffered recursive subtree” as generalizations of “leaf recursive
subtree,” even though the leaves do not have buffers.
The buffer heights are carefully chosen using the Fibonacci fac-
tors to match this recursive layout. It is always the case that when
splitting a tree at the proper height, the leaf nodes of a height-F
k
top
or bottom recursive subtree have a height-F
H (k+1)
buffer that can
be stored after the recursive subtree (as indicated in Figure 1). The
exception is for buffers that would have a sufficiently small con-
stant height; these buffers are omitted altogether. (The elimination
of small buffers helps the analysis in Lemma 16.)
Another way to interpret buffer sizes is that a node has a buffer
for every (sufficiently large) recursive subtree in which it is a leaf.
Thus, nodes that are roots of height-2 or taller recursive subtrees
(i.e., those having Fibonacci factors > 1) have no buffers, because
they cannot also be leaves of recursive subtrees. This notion is cap-
tured by the following lemma, which can be proved by induction.