Benchmarking a B-tree compression method
?
Filip Kˇriˇzka, Michal Kr´atk´y, and Radim Baˇca
Department of Computer Science, Technical University of Ostrava, Czech Republic
{filip.krizka,michal.kratky,radim.baca}@vsb.cz
Abstract. The B-tree and its variants have been widely
applied in many data management fields. When a com-
pression of these data structures is considered, we follow
two objectives. The first objective is a smaller index file,
the second one is a reduction of the query processing time.
In this paper, we apply a compression scheme to fit these
objectives. The utilized compression scheme handles com-
pressed nodes in a secondary storage. If a page must be re-
trieved then this page is decompressed into the tree cache.
Since this compression scheme is transparent from the tree
operation’s point of view, we can apply various compression
algorithms to pages of a tree. Obviously, there are compres-
sion algorithms suitable for various data collections, and
so, this issue is very important. In our paper, we compare
the B-tree and compressed B-tree where the Fast Fibonacci
and invariable coding compression methods are applied.
Key words: B-tree and its variants, B-tree compression,
compression scheme, fast decompression algorithm
1 Introduction
The B-tree represents an efficient structure for the
finding of an ordered set [6]. The B-tree has been often
used as the backbone data structure for the physical
implementation of RDBMS or file systems. Its most
important characteristic is that keys in a node have
very small differences to each others. We utilize this
feature in the B-tree compression. In this case, nodes
are compressed in the secondary storage and they are
decompressed during their reading into the cache. Due
to the fact that the random access in the secondary
storage is a rather expensive operation, we save time
when reading the nodes.
In work [11], authors summarize some methods
for organizing of B-trees. A prefix B-tree, introduced
in [7], provides the head and tail compression. In the
case of the head compression, one chooses a common
prefix for all keys that the page can store, not just the
current keys. Tail compression selects a short index
term for the nodes above the data pages . This index
needs merely to separate the keys of one data node
from those of its sibling and is chosen during a node
split. Tail compression produces variable length index
?
Work is partially supported by Grants of GACR
No. 201/09/0990 and IGA, FEECS, Technical Univer-
sity of Ostrava, No. BI 4569951, Czech Republic.
entries, and [7] describes a binary search that copes
with variable length entries.
Work [9] describ es a split technique for data. Rows
are assigned tag values in the order in which they are
added to the table. Note that tag values identify rows
in the table, not records in an individual partition or
in an individual index. Each tag value appears only
once in each index. All vertical partitions are stored
in the B-tree with the tag value as the key. The novel
aspect is that the storage of the leading key is reduced
to a minimal value.
Unlike these works, in our work we suppose the
B-tree compression without changes of the B-tree
structure. We mainly utilize the fast decompression al-
gorithm. In the case of the previously depicted papers,
B-tree c ompress ion is possible using a modification of
the B-tree structure. In work [7], B-tree is presented by
B
∗
-index and B
∗
-file. The keys stored in the B
∗
-index
are only used to searching and determining in which
subtree of a given branch node a key and its associ-
ated record will be found. The B
∗
-index itself is a con-
ventional B-tree including prefixes of the keys in the
B
∗
-file. This prefix B-tree combines some of the advan-
tages of B-trees, digital search trees, and key compres-
sion without sacrificing the basic simplicity of B-trees
and the associated algorithms and without inheriting
some of the disadvantages of digital search trees and
key compression techniques. Work [9] describes an ef-
ficient columnar storage in B-trees. Column-oriented
storage formats have been proposed for query process-
ing in relational data warehouses, specifically for fast
scans over non-indexed columns. This data compres-
sion metho d reuses traditional on-disk B-tree struc-
tures with only minor changes yet achieves storage
density and sc an performance comparable to special-
ized columnar designs. In work [1], B-tree compression
is used for minimizing the amount of space used by
certain types of B-tree indexes. When a B-tree is com-
pressed, duplicate occurrences of the indexed column
values are eliminated. It is compressed by clustering
the same keys and their unindexed attributes.
This paper is organized as follows. In Section 2, we
briefly summarize basic knowledge about the B-tree.
Section 3 shows a compression scheme used [3]. Sec-
tion 4 describes two compression methods. Section 5
shows results of the compression methods. The com-
pressed B-tree is compared with a proper B-tree. In