压缩B树的性能基准测试

29 浏览量更新于2024-08-25 收藏 244KB PDF 举报

"Benchmarking a B-tree Compression Method" 是一篇关于在计算机科学领域，尤其是数据管理中对B树压缩方法进行基准测试的研究论文。作者Filip Křižka、Michal Krátký和Radim Bača来自捷克奥斯特拉瓦技术大学的计算机科学系。在B树及其变体被广泛应用于数据管理的背景下，研究者提出了两个主要目标：一是减少索引文件的大小，二是降低查询处理时间。B树是一种高效的数据结构，常用于数据库和文件系统中，用于存储和检索大量数据。压缩B树可以有效地节省存储空间，同时优化读取性能。论文中，作者应用了一种压缩方案来实现这两个目标。该方案将压缩的节点存储在辅助存储中，当需要访问页面时，会将这个压缩页面解压到树缓存中。由于这种压缩方案在树操作层面是透明的，因此可以对树的不同页面应用各种压缩算法。不同的数据集合可能适合不同的压缩算法，因此选择合适的压缩方法至关重要。在论文中，作者对比了未压缩的B树与应用了Fast Fibonacci压缩方法和可变编码压缩方法的压缩B树。Fast Fibonacci是一种快速的无符号整数编码方法，而可变编码则是一种针对不同长度数据项的高效编码方式。通过比较，作者旨在评估这些压缩策略在索引大小、查询效率和整体性能上的表现。这篇论文的贡献在于提供了对B树压缩方法的实际基准测试，这对于理解不同压缩算法在实际数据管理场景中的效果具有重要意义。它为数据库管理员和系统设计者提供了一种评估和选择合适B树压缩策略的方法，以优化存储效率和查询性能。

Benchmarking a B-tree compression method

Filip Kˇriˇzka, Michal Kr´atk´y, and Radim Baˇca

Department of Computer Science, Technical University of Ostrava, Czech Republic

{filip.krizka,michal.kratky,radim.baca}@vsb.cz

Abstract. The B-tree and its variants have been widely

applied in many data management ﬁelds. When a com-

pression of these data structures is considered, we follow

two objectives. The ﬁrst objective is a smaller index ﬁle,

the second one is a reduction of the query processing time.

In this paper, we apply a compression scheme to ﬁt these

objectives. The utilized compression scheme handles com-

pressed nodes in a secondary storage. If a page must be re-

trieved then this page is decompressed into the tree cache.

Since this compression scheme is transparent from the tree

operation’s point of view, we can apply various compression

algorithms to pages of a tree. Obviously, there are compres-

sion algorithms suitable for various data collections, and

so, this issue is very important. In our paper, we compare

the B-tree and compressed B-tree where the Fast Fibonacci

and invariable coding compression methods are applied.

Key words: B-tree and its variants, B-tree compression,

compression scheme, fast decompression algorithm

1 Introduction

The B-tree represents an eﬃcient structure for the

ﬁnding of an ordered set [6]. The B-tree has been often

used as the backbone data structure for the physical

implementation of RDBMS or ﬁle systems. Its most

important characteristic is that keys in a node have

very small diﬀerences to each others. We utilize this

feature in the B-tree compression. In this case, nodes

are compressed in the secondary storage and they are

decompressed during their reading into the cache. Due

to the fact that the random access in the secondary

storage is a rather expensive operation, we save time

when reading the nodes.

In work [11], authors summarize some methods

for organizing of B-trees. A preﬁx B-tree, introduced

in [7], provides the head and tail compression. In the

case of the head compression, one chooses a common

preﬁx for all keys that the page can store, not just the

current keys. Tail compression selects a short index

term for the nodes above the data pages . This index

needs merely to separate the keys of one data node

from those of its sibling and is chosen during a node

split. Tail compression produces variable length index

Work is partially supported by Grants of GACR

No. 201/09/0990 and IGA, FEECS, Technical Univer-

sity of Ostrava, No. BI 4569951, Czech Republic.

entries, and [7] describes a binary search that copes

with variable length entries.

Work [9] describ es a split technique for data. Rows

are assigned tag values in the order in which they are

added to the table. Note that tag values identify rows

in the table, not records in an individual partition or

in an individual index. Each tag value appears only

once in each index. All vertical partitions are stored

in the B-tree with the tag value as the key. The novel

aspect is that the storage of the leading key is reduced

to a minimal value.

Unlike these works, in our work we suppose the

B-tree compression without changes of the B-tree

structure. We mainly utilize the fast decompression al-

gorithm. In the case of the previously depicted papers,

B-tree c ompress ion is possible using a modiﬁcation of

the B-tree structure. In work [7], B-tree is presented by

∗

-index and B

∗

-ﬁle. The keys stored in the B

∗

-index

are only used to searching and determining in which

subtree of a given branch node a key and its associ-

ated record will be found. The B

∗

-index itself is a con-

ventional B-tree including preﬁxes of the keys in the

∗

-ﬁle. This preﬁx B-tree combines some of the advan-

tages of B-trees, digital search trees, and key compres-

sion without sacriﬁcing the basic simplicity of B-trees

and the associated algorithms and without inheriting

some of the disadvantages of digital search trees and

key compression techniques. Work [9] describes an ef-

ﬁcient columnar storage in B-trees. Column-oriented

storage formats have been proposed for query process-

ing in relational data warehouses, speciﬁcally for fast

scans over non-indexed columns. This data compres-

sion metho d reuses traditional on-disk B-tree struc-

tures with only minor changes yet achieves storage

density and sc an performance comparable to special-

ized columnar designs. In work [1], B-tree compression

is used for minimizing the amount of space used by

certain types of B-tree indexes. When a B-tree is com-

pressed, duplicate occurrences of the indexed column

values are eliminated. It is compressed by clustering

the same keys and their unindexed attributes.

This paper is organized as follows. In Section 2, we

brieﬂy summarize basic knowledge about the B-tree.

Section 3 shows a compression scheme used [3]. Sec-

tion 4 describes two compression methods. Section 5

shows results of the compression methods. The com-

pressed B-tree is compared with a proper B-tree. In

下载后可阅读完整内容，剩余6页未读，立即下载

x_jiali

粉丝: 5
资源: 897

压缩B树的性能基准测试

SSFL-Benchmarking-Semi-supervised-Federated-Learning:对标半监督联合学习

benchmarking-gnns-pyg

java SystemClock.now()

pytorch的benchmarking 介绍

parser = argparse.ArgumentParser(description='k-FP benchmarks')

ycsb压测mongodb

ros noetic moveit中的Benchmarking功能

服务器上用镜像装busco

mit-bih arrhythmia FFT

安装BUSCO时，如何选择合适的conda镜像源？

最新资源