Chucky：LSM-Tree中的高效Cuckoo过滤器

需积分: 5 147 浏览量更新于2024-08-05 收藏 1.92MB PDF 举报

"Chucky: A Succinct Cuckoo Filter for LSM-Tree" 本文介绍了Chucky，这是一种针对LSM-Tree（Log-Structured Merge Tree）的简洁型Cuckoo过滤器设计，旨在优化现代键值存储系统中的读写性能。随着固态硬盘（SSD）技术的发展，存储设备与内存设备之间的性能差距正在缩小，原本用于优化读取速度的Bloom过滤器逐渐成为性能瓶颈。Chucky的设计目标是替换多个Bloom过滤器，通过一个单一的Cuckoo过滤器来映射每个数据条目在LSM树中的辅助地址。传统的LSM-Tree结构通常结合了SSD上的存储和DRAM中的Bloom过滤器。Bloom过滤器在内存中用于减少对存储层不必要的读取，从而提高读取效率。然而，随着SSD性能的提升，这种优化方式的效率变得不再理想。Chucky提出的新思路是，使用Cuckoo过滤器替代Bloom过滤器，这可以减少内存访问次数，但初始时会带来更高的假阳性率。 Cuckoo过滤器的指纹（fingerprint）是其核心特性，用于识别数据条目。由于辅助地址占据了原本用于指纹的部分位，导致假阳性率上升。为解决这个问题，研究者应用信息理论的技术，对辅助地址进行简洁编码，使得指纹的大小得以保持，从而在保持较低假阳性率的同时，降低了访问成本。 Chucky的创新之处在于它成功地平衡了访问成本和假阳性率。通过信息理论的编码方法，Chucky能够在保持较低的错误率的同时，实现更高效的内存访问，这对于依赖于快速读写的键值存储系统来说是一大进步。这一设计对于持续优化SSD时代的数据库性能具有重要意义，特别是在大数据和实时查询的场景下。 Chucky是一种针对LSM-Tree存储结构的高效过滤器设计，它结合了Cuckoo过滤器的高效性和信息理论的简洁编码，旨在克服传统Bloom过滤器的性能限制，为现代键值存储系统提供更好的读写性能。这一技术有望在未来的存储系统优化中发挥重要作用。

Leveling Lazy-Leveling Tiering

application query O(L) O(L · T ) O(L · T )

application update O(L · T ) O(L + T ) O(L)

Table 1: Blocked Bloom lters’ memor y I/O complexities.

the buer (for a delete, the value is a tombstone). Whenever runs

that contain entries with the same key are merged, older versions

are discarded and only the newest version is kept. To always nd

the most recent version of an entry, a query traverses the runs from

youngest to oldest (from smaller to larger levels, and from lower

to higher sub-levels within a level). It terminates when it nds the

rst entry with a matching key. If this entry’s value is a tombstone,

the query returns a negative result.

Fence Pointers.

For each run, there are fence pointers in memory

that comprise the min/max key at every data block. They allow

queries to binary search for the relevant data block that contains

a given key in

≈ log(N )

memory I/Os so that this block can be

retrieved cheaply with one storage I/O.

Bloom Filters.

For each run in the LSM-tree, there is an in-memory

Bloom lter (BF), a space-ecient probabilistic data structure used

to test whether a key is a member of a set [

]. A BF is an array of

bits with

hash functions. Each key is mapped using these hash

functions to

random bits, setting them from 0 to 1 or keeping them

set to 1. Checking for the existence of a key requires examining its

bits. If any are set to 0, we have a negative. If all are set to 1, we

have either a true or a false positive. The false positive probability

(FPP) is 2

−M ·ln(2)

, where

is the number of bits per entry. As we

increase

, the probability of hash collisions decreases and so the

FPP drops. A BF does not support range reads or deletes. The lack

of delete support means that a new BF has to be built from scratch

for every new run as a result of compaction.

A BF entails

memory I/Os for an insertion or for a query to an

existing key. For a query to a non-existing key, it entails on average

two memory I/Os since

≈

50% of the bits are set to zero and so the

expected number of bits checked before incurring a zero is two.

Blocked Bloom Filters.

To optimize memory I/Os, Blocked Bloom

lter has been proposed as an array of contiguous BFs, each the size

of a cache line [

]. A key is inserted by rst hashing it to one of

the constituent BFs and then inserting the key into it. This entails

only one memory I/O for any insertion or query. The trade-o is

a slight FPP increase. RocksDB has adopted blocked BFs. We use

standard and blocked BFs as baselines in this paper, and we focus

more on blocked BFs as they are the tougher competition.

Memory I/O Analysis.

With blocked BFs, the overall cost of a

point query is

(L −

) · K + Z

memory I/Os, one for each sub-

level of the LSM-tree. The cost of an insertion/update/delete, on

the other hand, is the same as the LSM-tree’s write-amplication:

≈

T −1

K +1

· (L −

) +

T −1

Z +1

with Dostoevsky. The reason is that every

compaction that an entry participates in leads to one BF insertion,

which costs one memory I/O. Table 1 summarizes these costs for

each of the merge policies. It shows that query cost over the BFs can

be signicant, especially with tiering and lazy leveling. Moreover,

the BF’s query and construction costs both increase with respect

to the number of levels

and thus with the data size. Finally, there

is an inverse relationship between the BFs’ query and construction

costs: the greedier we set the LSM-tree’s merge policy to be (i.e., by

ne-tuning the the parameters

and

), query cost decreases as

there are fewer BFs while construction cost increases as the BFs get

rebuilt more frequently. Can we better scale these costs with respect

to the data size while also alleviating their read/write contention?

False Positive Rate Analysis.

We dene the false positive rate

(FPR) as the sum of FPPs across all lters. The FPR expresses the

average number of I/Os due to false positives that occur per point

query over the whole LSM-tree. Equation 2 expresses the FPR for

most KV-stores, which assign the same number of bits per entry

to all their BFs. This approach, however, was recently deemed sub-

optimal. The optimal approach is to reassign

≈

1 bit per entry from

the largest level and to use it to assign linearly more bits per entry

to lters at smaller levels [

]. While this increases the largest

level’s FPP, it exponentially decreases the FPPs at smaller levels

such that the overall FPR is lower, as expressed in Equation 3 [

F P R

uni f or m

= 2

−M ·ln(2)

(

K · (L − 1) + Z

)

(2)

F P R

opt imal

= 2

−M ·ln(2)

· Z

T −1

· K

T −1

T − 1

(3)

Equation 3 states that with the optimal approach, the relationship

between memory and FPR is independent of the number of levels

and thus of data size, unlike Equation 2. The reason is that as the

LSM-tree grows, smaller levels are assigned exponentially lower

FPRs thus causing the sum of FPRs to converge. It is imperative

that any replacement we devise for the LSM-tree’s Bloom lters

either matches or improves on the FPR expressed in Equation 3.

3 PROMISE: LSM-TREE & CUCKOO FILTER

Cuckoo lter [

] (CF) is one of several data structures [

] that recently emerged as alternatives to Bloom lters. In

their core, these structures all employ a compact hash table that

stores ngerprints of keys, where a ngerprint is a string of

bits

derived by hashing a key. CF comprises an array of buckets, each

with

slots for ngerprints. During insertion, an entry with key

is hashed to two bucket addresses

and

using Equation 4. A

ngerprint of key

is inserted into whichever bucket has space. If

both buckets are full, however, some ngerprint from one of the two

buckets is randomly chosen and swapped to its alternative bucket

to clear space. By virtue of using the xor operator, the right-hand

side of Equation 4 allows to always compute an entry’s alternative

bucket using the ngerprint and current bucket address without

the original key. The swapping process continues recursively either

until a free slot is found for all ngerprints or until a swapping

threshold is reached, at which point the insertion fails.

= hash(k) b

= b

⊕ hash(k

′

s f inдer print) (4)

We employ a CF with

set to four slots per bucket through the

paper. Such a tuning can reach 95% space utilization with high

probability without incurring insertion failures and with only 1-2

amortized swaps per insertion. The false positive rate is

≈

·S ·

−F

where

is the ngerprint size in bits. Querying entails at most two

memory I/Os as each entry is in one of two buckets.

Promise.

Cuckoo lter supports storing updatable auxiliary data

for each entry alongside its ngerprint. We propose to replace

the LSM-tree’s multiple Bloom lters with a CF that maps each

data entry not only to a ngerprint but also to a Level ID (LID),

mapping to the sub-level that contains the entry. The promise is to

allow nding the run that contains a given entry with at most two

memory I/Os, far more cheaply than with blocked Bloom lters.

剩余13页未读，继续阅读

mxxtxmyn

粉丝: 0
资源: 19

Chucky：LSM-Tree中的高效Cuckoo过滤器

Chucky-tools: 探索Python命令行工具的强大功能

软考网工必备：计算公式全集 - 单元换算与网络性能

Amiga 4000副本项目：社区合作与技术成就

shifterdb：基于数据库的LSM-Tree，本机支持ACID事务

chucky-tools:chucky的命令行工具

Cuckoo过滤器：实际上比 Bloom 更好_Go语言_代码_相关文件_下载

Sunset-Cyber​​space：:video_game::alien_monster:世博会，Three.js，OpenGL，WebGL和Tween中的复古跑者游戏。 :joystick:

c-semantics:K中C的语义

esolang-semantics:一些深奥的编程语言的形式语义

JS-url插件

最新资源

Sunset-Cyberspace：:video_game::alien_monster:世博会，Three.js，OpenGL，WebGL和Tween中的复古跑者游戏。 :joystick: