ElasticBF: Fine-grained and Elastic Bloom Filter Towards Efficient Read for
LSM-tree-based KV Stores
Yueming Zhang
1
,YongkunLi
1,2
,FanGuo
1
,ChengLi
1,2
,YinlongXu
1,2
1
University of Science and Technology of China
2
Anhui Province Key Laboratory of High Performance Computing, USTC
Abstract
LSM-tree based KV stores suffer from severe read am-
plification, especially for large KV stores. Even worse,
many applications may issue a large amount of lookup
operations to search for nonexistent keys, which wastes
alotofextraI/Os. EventhoughBloomfilterscanbe
used to speedup the read performance, existing designs
usually adopt a uniform setting for all Bloom filters and
fail to support dynamic adjustment, thus results in a high
false positive rate or large memory consumption. To ad-
dress this issue, we propose ElasticBF, which constructs
more small filters for each SSTable and dynamically load
into memory as needed based on access frequency, so it
realizes a fine-grained and elastic adjustment in running
time with the same memory usage. Experiment shows
that ElasticBF can achieve 1.94×-2.24× read through-
put compared to LevelDB under different workloads,
and preserves the same write performance. More im-
portantly, ElasticBF is orthogonal to existing works opti-
mizing the structure of KV stores, so it can be used as an
accelerator to further speedup their read performance.
1Introduction
Key-value (KV) store has become an important stor-
age engine for many applications [12, 4, 16, 23]. The
most common design of KV stores is based on Log-
Structured Merge-tree (LSM-tree), which groups KV
pairs into fixed-size files, e.g., the SSTables in LevelDB
[7]. Files are further organized into multiple levels as
shown in Figure 1. KV pairs are sorted in each level
except level 0. With this level-based structure, data is
first flushed from memory to the lowest level, and then
merged into higher levels by using compaction when this
level reaches its capacity limit. Compaction inevitably
causes the write amplification problem, and many recent
researches focus on addressing this issue [3, 22, 13].
On the other hand, LSM-tree based KV stores also
suffer from severe read amplification problem[14, 21],
Figure 1: LSM-tree based structure.
especially for large KV stores. This is mainly because
aKVstorecontainsmultiplelevels,andreadingaKV
item needs to check from the lowest level to the highest
level until the data is found or all levels are checked. This
process inevitably incurs multiple I/Os and amplifies the
read operation. In the worst case, 14 SSTables need to
be inspected [13], and even worse, if the target KV item
does not exist in the store[19, 20, 9], then all the I/O re-
quests are totally wasted. We point out that only one file
in each level need to be examined as data in each level
are kept in a sorted order, and this file can also be easily
located by checking the key ranges of each file.
To reduce extra I/O requests, Bloom filters are widely
used in KV stores [19]. By first asking the filter to check
if the requested data exists in the SSTable, extra I/Os can
be reduced. However, Bloom filter has false positive,
so it may return an “existence” answer even if the data
does not really exist, still incurs extra I/Os to inspect the
SSTable. Even though the false positive rate (FPR) can
be reduced by increasing the length of filters [2, 7], it will
increase the memory usage. As a result, some filters may
be swapped out to disks as memory is usually a scarce re-
source in KV stores [11, 8]. If filters are not in memory,
extra I/Os must be incurred to load the filter into mem-
ory before inspecting a SSTable, and this exacerbates the
read amplification. A recent work proposes to adjust the
length of filters for different levels [6], while it only uses
asamesettingforallfiltersinthesamelevelandfailsto
dynamically adjust the setting according to data hotness.
Therefore, there still remains a challenging problem for
LSM-tree-like KV stores: How to reduce the false posi-
tive rate of Bloom filters with least memory usage?