Read/write-optimized tree indexing for solid-state drives
Fig. 4 Element mapping in a Bloom filter
hash functions h
1
(x), h
2
(x),…,h
k
(x). Second, we set all
bits BF[h
i
(x)]to1.
For answering the membership query like y ∈ S,wefirst
calculate the k values of hash functions h
1
(y), h
2
(y),…,
h
k
(y). Then, we check all the Bloom filters of each element
in S and see whether all the BF[h
i
(y)] are 1. If not, y is
not a member of S. If all the BF[h
i
(y)]are1,y may be
in S. Due to the possibility of collisions among the hash
functions, there is a nonzero false-positive probability when
evaluating membership queries on Bloom filters. Given n, k,
and m, previous results [7] have shown that the false-positive
probability of a Bloom filter can be computed by Eq. (3.1).
Further, it is demonstrated that the false-positive probability
is minimalized when k = 0.7
m
n
, which is approximately
0.6185
m
n
.
f
BF
= (1 − e
−kn
m
)
k
(3.1)
However, the Bloom filter does not support deletions of ele-
ments. A recent study [7] enhances Bloom filters to support
deletions.
Bloom filters are space efficient. In addition, they are time
efficient for inserting elements and answering membership
queries. Therefore, we incorporate Bloom filters into B+-
tree-based indices for SSDs to improve search performance.
3.3 Structure of the BloomTree
Figure 5 shows the structure of the BloomTree. We improve
the traditional B+-tree with two new designs. First, we
introduce three kinds of leaf nodes, namely Normal Leaf,
Overflow Leaf (OF-leaf ), and Bloom Filter Leaf (BF-leaf ).
Second, we propose to construct Bloom filters in the BF-leaf
nodes and use overflow pages in the OF-leaf nodes.
A normal leaf node is the same as a leaf node on the tradi-
tional B+-tree, and it occupies exactly one page. An OF-leaf
node contains overflow pages. However, an OF-leaf node
contains at most three overflow pages (covered later in this
section), in order to reduce read costs of OF-leaf nodes. If an
OF-leaf node expands beyond three pages, it is transformed
into a BF-leaf node, which offers a more efficient organi-
zation for overflow pages. A BF-leaf node is designed for
organizing leaves with more than three overflow pages. As
shown in Fig. 5, it contains several data pages and a leaf-head
page maintaining the Bloom filters and metadata.
When searching in the BloomTree, the only difference
from the B+-tree is in how leaf nodes are searched. Searching
a normal leaf node is the same as in the B+-tree. Searching
an OF-leaf node needs to scan the entire list of the overflow
pages in the worst case. When searching a BF-leaf node, we
first compute the Bloom filter of the search key, and then we
compare this Bloom filter with the Bloom filters maintained
in the leaf-head page that are computed for the indexed keys
in the BF-leaf node.
The objective of introducing BF-leaf nodes for t he B+-
tree is to improve the poor read performance imposed by the
overflow-page design. As shown in Fig. 3, the overflow B+-
tree introduces additional read operations that hurt the overall
performance of the index. These additional reads are mainly
incurred by the accesses to overflow pages. Given an OF-leaf
with N nodes, where each node has the same probability 1/N
to be accessed, the expected reads are given by E
OF
-cost.
E
OF-cost
=
N
i=1
1
N
=
N + 1
2
(3.2)
Next, we only need two page reads to search a BF-leaf node,
namely one read for t he leaf-head page and another for read-
Fig. 5 Structure of the
BloomTree
123
123