ElasticBF: leveldb中自适应Bloom滤器优化，提升大容量存储读取效率

需积分: 16 168 浏览量更新于2024-09-05 收藏 809KB PDF 举报

本文档深入探讨了在LevelDB这样的基于LSM（Log-Structured Merge）树的键值存储系统中，Bloom Filter的优化问题。LevelDB面临着严重的读取放大效应，特别是在处理大型键值对存储时，频繁的查找非存在的键会导致不必要的磁盘I/O开销。Bloom Filters作为一种常用的数据结构，被用来加速读取性能，通过减少查找过程中的误报率，提高查询效率。然而，现有的Bloom Filter设计通常采用统一的配置，没有灵活性，无法动态调整大小，这可能导致假阳性的高概率或者内存占用过大。针对这一问题，研究者提出了ElasticBF（Fine-grained and Elastic Bloom Filter）的设计。ElasticBF的主要创新在于为每个SSTable（Sorted String Table，LevelDB中用于存储数据的一种格式）构建更小的过滤器，并根据键值的访问频率动态加载到内存中。这种细粒度且弹性的策略允许在运行时根据实际需求调整，同时保持相同的内存使用量。通过实验对比，ElasticBF相较于LevelDB在不同场景下能够实现显著的读取性能提升，最高可以达到1.94倍至2.24倍。这意味着在处理大量查询和非存在键查找时，ElasticBF能够显著减少无效的I/O操作，从而提高了系统的整体效率和响应速度。这项优化对于那些对读取性能要求较高的应用来说，无疑是一个重要的改进，有助于降低存储成本并优化资源利用率。总结来说，本论文的核心贡献是提出了一种能够适应数据访问模式变化、实现动态调整的Bloom Filter优化方法，这对于提升LSM-tree基KV存储系统的读取性能具有实际价值。通过细致的分析和实验证据，ElasticBF展示了其在减少读取负载、提高系统响应速度方面的优势，为类似LevelDB等数据库的性能优化提供了新的思考方向。

ElasticBF: Fine-grained and Elastic Bloom Filter Towards Efﬁcient Read for

LSM-tree-based KV Stores

Yueming Zhang

,YongkunLi

1,2

,FanGuo

,ChengLi

1,2

,YinlongXu

1,2

University of Science and Technology of China

Anhui Province Key Laboratory of High Performance Computing, USTC

Abstract

LSM-tree based KV stores suffer from severe read am-

pliﬁcation, especially for large KV stores. Even worse,

many applications may issue a large amount of lookup

operations to search for nonexistent keys, which wastes

alotofextraI/Os. EventhoughBloomﬁlterscanbe

used to speedup the read performance, existing designs

usually adopt a uniform setting for all Bloom ﬁlters and

fail to support dynamic adjustment, thus results in a high

false positive rate or large memory consumption. To ad-

dress this issue, we propose ElasticBF, which constructs

more small ﬁlters for each SSTable and dynamically load

into memory as needed based on access frequency, so it

realizes a ﬁne-grained and elastic adjustment in running

time with the same memory usage. Experiment shows

that ElasticBF can achieve 1.94×-2.24× read through-

put compared to LevelDB under different workloads,

and preserves the same write performance. More im-

portantly, ElasticBF is orthogonal to existing works opti-

mizing the structure of KV stores, so it can be used as an

accelerator to further speedup their read performance.

1Introduction

Key-value (KV) store has become an important stor-

age engine for many applications [12, 4, 16, 23]. The

most common design of KV stores is based on Log-

Structured Merge-tree (LSM-tree), which groups KV

pairs into ﬁxed-size ﬁles, e.g., the SSTables in LevelDB

[7]. Files are further organized into multiple levels as

shown in Figure 1. KV pairs are sorted in each level

except level 0. With this level-based structure, data is

ﬁrst ﬂushed from memory to the lowest level, and then

merged into higher levels by using compaction when this

level reaches its capacity limit. Compaction inevitably

causes the write ampliﬁcation problem, and many recent

researches focus on addressing this issue [3, 22, 13].

On the other hand, LSM-tree based KV stores also

suffer from severe read ampliﬁcation problem[14, 21],

…

Metadata

Region

0HPRU\

…

/HYHO犆

/HYHO犇

/HYHO犌

'LVN

…

Bloom filter

'DWD%ORFN犇

'DWD%ORFN犈

'DWD%ORFN犉

ü

%ORRPILOWHU

,QGH[%ORFN

)RRWHU

1…13 24…35 36…64 66…9715…23

1…36 32..975…43

SSTable

Memtable

Bloom filter

Immutable

Memtable

Figure 1: LSM-tree based structure.

especially for large KV stores. This is mainly because

aKVstorecontainsmultiplelevels,andreadingaKV

item needs to check from the lowest level to the highest

level until the data is found or all levels are checked. This

process inevitably incurs multiple I/Os and ampliﬁes the

read operation. In the worst case, 14 SSTables need to

be inspected [13], and even worse, if the target KV item

does not exist in the store[19, 20, 9], then all the I/O re-

quests are totally wasted. We point out that only one ﬁle

in each level need to be examined as data in each level

are kept in a sorted order, and this ﬁle can also be easily

located by checking the key ranges of each ﬁle.

To reduce extra I/O requests, Bloom ﬁlters are widely

used in KV stores [19]. By ﬁrst asking the ﬁlter to check

if the requested data exists in the SSTable, extra I/Os can

be reduced. However, Bloom ﬁlter has false positive,

so it may return an “existence” answer even if the data

does not really exist, still incurs extra I/Os to inspect the

SSTable. Even though the false positive rate (FPR) can

be reduced by increasing the length of ﬁlters [2, 7], it will

increase the memory usage. As a result, some ﬁlters may

be swapped out to disks as memory is usually a scarce re-

source in KV stores [11, 8]. If ﬁlters are not in memory,

extra I/Os must be incurred to load the ﬁlter into mem-

ory before inspecting a SSTable, and this exacerbates the

read ampliﬁcation. A recent work proposes to adjust the

length of ﬁlters for different levels [6], while it only uses

asamesettingforallﬁltersinthesamelevelandfailsto

dynamically adjust the setting according to data hotness.

Therefore, there still remains a challenging problem for

LSM-tree-like KV stores: How to reduce the false posi-

tive rate of Bloom ﬁlters with least memory usage?

下载后可阅读完整内容，剩余5页未读，立即下载

supperhybin

粉丝: 0
资源: 2

ElasticBF: leveldb中自适应Bloom滤器优化，提升大容量存储读取效率

Python库 | leveldb-0.20.tar.gz

leveldb实现解析的副本.pdf

leveldb-1.20.tar.gz

leveldb-1.18.tar.gz

leveldb-1.5.0.tar.gz

leveldb-1.7.0.tar.gz

LevelDB手册（LevelDB Handbook）.pdf

leveldb,leveldb到java的端口.zip

storemate-backend-leveldb-0.9.23.zip

infinispan-cachestore-leveldb-8.0.1.Final.zip

最新资源