LWC-Tree：优化键值存储的轻量级压缩技术

23 浏览量更新于2024-08-26 收藏 2.78MB PDF 举报

"轻量级压缩树，用于降低有效键值存储的I/O放大问题的研究论文" 在当前的数据中心中，日志结构合并树（LSM-tree）已成为支持各种写入密集型互联网应用的主要索引之一。然而，由于频繁发生的合并操作，LSM-tree的性能受到严重影响，这导致了显著的写放大并降低了写入吞吐量。为了解决由合并引起的性能下降问题，研究人员提出了轻量级压缩树（Light-weight Compaction Tree，简称LWC-tree），这是一种针对LSM-tree优化的变体，旨在最小化写放大并最大化系统吞吐量。 LWC-tree的核心思想是通过一种轻量级的压缩策略，大幅降低I/O放大。传统的LSM-tree在数据写入时，首先将数据写入内存中的内存缓冲区，当缓冲区满后，会将数据批量写入磁盘的顺序写日志。随着日志的增长，需要定期进行合并操作，将数据从较慢的磁盘层移动到较快的磁盘层，以保持读取效率。这个过程会导致大量的磁盘写入操作，从而增加写放大。 LWC-tree的设计目标是减少这种不必要的I/O操作。它通过引入更有效的数据布局和压缩技术来实现这一目标。具体来说，LWC-tree可能包括以下几个关键特性： 1. **紧凑的数据结构**：LWC-tree采用了一种紧凑的数据结构，减少了无效空间和重叠的数据块，从而在合并过程中减少不必要的I/O操作。 2. **动态压缩**：LWC-tree可能采用了动态的、适应性的压缩策略，根据数据的特性调整压缩级别，平衡压缩率与解压缩时间，以降低写放大。 3. **优化的合并策略**：LWC-tree可能采用了优化的合并策略，比如延迟合并或者智能合并，使得合并操作更加高效，避免在高负载期间进行大范围的合并，从而减少对写入性能的影响。 4. **更好的空间利用率**：LWC-tree通过优化的数据布局，提高磁盘空间的利用率，减少由于空洞和碎片导致的额外I/O。 5. **并行处理**：为了进一步提升性能，LWC-tree可能会利用多线程或分布式处理技术，将合并任务并行化，减少整体的处理时间。 6. **预读和缓存策略**：LWC-tree可能结合预读技术和缓存策略，减少读取时的I/O延迟，特别是在读写比例不均衡的应用场景中。通过这些优化，LWC-tree不仅减少了I/O放大，而且提高了系统的整体性能，特别是在写密集型工作负载下。这种创新的索引结构对于现代数据中心的键值存储系统有着重要的实际意义，能够帮助提升服务质量和用户满意度。这篇研究论文探讨了如何通过轻量级压缩树来解决LSM-tree在大量写入操作下的性能瓶颈，为大数据存储和处理领域提供了一种新的解决方案，有望改善当前键值存储系统的效率和可靠性。

TABLE I

MEANINGS OF THE SYMBOLS

data

Data size of disk I/O in a compaction

sst

Size of an SSTable

Size of a DTable

metadata

Metadata size in a compaction

RA/WA R/W ampliﬁcation from LSM-tree or LWC-tree

ARA/AWA Auxiliary R/W ampliﬁcation from SMR drives

MRA/MWA Overall R/W ampliﬁcation, MWA=MA × AWA

………

…

read

①

………

…

③

(a)before compaction

(b)after compaction

Memory

Disk

②

sort

write

Fig. 1. The compaction procedure of LSM-trees. The compaction involves

read, sort and rewrite multiple SSTables, causing serious I/O ampliﬁcation.

KV items, which introduces excessive writes and represents a

performance bottleneck of LSM-tree based key-value stores.

More speciﬁcally, a single compaction has three steps. For

the convenience of exposition, let us call the SSTable selected

to compact in level L

as a victim SSTable and the SSTables

whose key ranges fall in the key range of the victim SSTable

in the next level L

i+1

as overlapped SSTables. To start a

compaction, LevelDB ﬁrst selects victim SSTables and the

overlapped SSTables according to the score of each level,

and then it decides whether more victim SSTables in L

can be added into the compaction by searching the SSTables

whose key ranges fall in the ranges of overlapped SSTables.

Figure 1 pictorially shows the three steps of a compaction

procedure of LSM-trees. As it is shown, during the compaction

procedure, LevelDB ﬁrst reads the victim SSTable in level L

and overlapped SSTables in level L

i+1

. After that, LevelDB

merges and sorts SSTables that have been fetched into memory

by the ﬁrst step. Finally, LevelDB writes the newly generated

SSTables to disk. According to the size limit of each level

in LevelDB, the size of L

i+1

is 10 times that of L

and this

size factor is called ampliﬁcation factor (AF). Due to this size

relationship, on average a victim SSTable in level L

has AF

overlapped SSTables in level L

i+1

and thus the total data size

involved in a compaction is given by Equation 1, where S

sst

represents the size of an SSTable and S

data

represents the

data size of disk I/O in a compaction. The 2× multiplication

indicates both read and write the total data. With a large

dataset, the ultimate ampliﬁcation could be over 50 (10 for

each gap between L

to L

), as it is possible for any newly

generated SSTable to migrate from L

to L

through a series

of compaction steps [1].

data

= (AF + 1) × S

sst

× 2 (1)

To quantitatively measure the degree of ampliﬁcation and

performance degradation due to compaction in practice with

LevelDB, we carry out the following experiments with the

48.79

61.37

73.22

86.05

99.15

110.7

4.79

5.74

6.7

7.66

8.62

9.57

100

120

5 6 7 8 9 10

Disk I/O (GB)

Data size (GB)

Actual disk I/Os

Input write down

(a) I/O comparison

10.19

10.69

10.93

11.23

11.5

11.57

5 6 7 8 9 10

Write Amplification

Data size(GB)

(b) write ampliﬁcation

2.1

1.6

1.8 1.8

1.5

23.6

23.7

23.8

23.9

20.2

23.4

5 6 7 8 9 10

Rand.write.throughput (MB/s)

Data Size (GB)

with compaction without comption

37.95

43.41

40.82

52.16

49.17

72.93

17.42

19.71

21.60

21.74

21.56

30.23

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

5 6 7 8 9 10

Rand.read latency(ms)

Data Size (GB)

with compaction without comption

(d) random read

Fig. 2. Write ampliﬁcation and performance degradation due to

compactions. 2(a): the comparison between write data size and actual disk

I/O size; 2(b): the write ampliﬁcation of different write data sizes; 2(c): the

random write performance with and without compaction; 2(d):the random

read performance with and without compaction.

same conﬁguration of LevelDB on HDDs in section V. First,

we randomly load databases of size 5GB, 6GB, 7GB, 8GB,

9GB and 10GB, respectively. The value size is 4KB by default.

Figure 2(a) shows the relationship between input data size and

actual disk I/O data size. As can be seen, in all cases, LevelDB

incurs signiﬁcant write ampliﬁcation. For example, writing

10GB input data results in 110.7GB actual disk write and the

corresponding write ampliﬁcation ratio is 11.57 as shown in

Figure 2(b). Based on Figure 2(a), Figure 2(b) calculates all

the write ampliﬁcation factors and a minimum value of 10.19

is observed. Second, to evaluate the random I/O performance

with compaction or without compaction, we use 6 different

database sizes for the initial random loads as well and random-

ly query 1 million keys following a uniform distribution. For

the I/O performance without compaction, we trigger a manual

compaction immediately after ﬁnishing loading the database

and before starting performing I/Os to eliminate concurrent

compaction and mitigate the compaction interferences. Figure

2(c) and Figure2(d) show the performance degradation caused

by compaction to random write and random read, respectively.

The write and read throughputs without compaction on average

are 13.01× and 2.23× the throughputs with compaction,

respectively. We design the LWC-tree mainly to eliminate the

ampliﬁcation caused by compactions in LSM-trees.

III. LWC-TREE DESIGN

As demonstrated in the previous section, the conventional

LSM-tree incurs excessive I/O ampliﬁcation when used as the

key-value store index. We design the LWC-tree to alleviate the

I/O ampliﬁcation caused by compactions and aim to achieve

high write throughput without sacriﬁcing read performance.

As a variant of LSM-tree, LWC-tree is also composed of

one memory-resident component and multiple disk-resident

components. Key-value items are written to the memory

component ﬁrst, then dumped to disk, and ﬁnally compacted

to lower levels. We keep the sorted tables and the multi-level

剩余12页未读，继续阅读

weixin_38582909

粉丝: 5

LWC-Tree：优化键值存储的轻量级压缩技术

全国及济南地图数据压缩包下载指南

NoSQL索引技术深度对比：B树与LSM树性能分析与选择指南

【高级特性应用】：利用FlashDB高级特性优化嵌入式应用

MySQL索引管理：动态调整的高级技巧

【云实训平台监控与日志】：实时监控平台状态的高级技巧

【深入Vue.js】：本地JSON图片路径动态加载技术的高级应用

48页-智慧园区解决方案.pdf

芋道 yudao ruoyi-vue-pro bmp sql , 更新时间 2025-01-24 ，对应yudao版本2.4.1

YOLOv5在PyTorch ONNX CoreML TFLite.zip

JavaScript项目代码-家庭聚会神器-打牌计分微信小程序

最新资源