WiscKey：优化SSD的LSM树Key-Value分离存储策略

需积分: 22 112 浏览量更新于2024-07-18 收藏 2.24MB PDF 举报

"WiscKey是一种针对固态硬盘（SSD）优化的持久化LSM树（Log-Structured Merge Tree）基键值存储系统，它采用了一种性能导向的数据布局策略，将键（Keys）与值（Values）分开，以减少由于LSM树导致的写放大的问题。该技术由Lanyue Lu、Thanumalayan Sankaranarayana Pillai、Andrea C. Arpaci-Dusseau和Remzi H. Arpaci-Dusseau在威斯康星大学麦迪逊分校研发，并在2016年的第14届USENIX文件与存储技术会议上发表。" WiscKey的核心概念是通过分离键与值来优化SSD的性能。在传统的LSM树中，每次写操作都会导致键值对被写入到内存中的数据结构，随着时间推移，这些数据会定期合并到磁盘上，这个过程可能会产生大量的额外I/O操作，即所谓的写放大。WiscKey的创新在于它只将键存储在SSD中，而值则存储在主内存中，这样的设计减少了磁盘上的写入次数，因为大多数键通常会在内存中缓存，只有在访问时才需要读取对应的值。在WiscKey中，键的顺序写入和访问特性充分利用了SSD的优势，SSD在处理顺序I/O时表现出色。同时，通过将值保留在内存中，可以显著减少对SSD的随机访问，从而降低I/O延迟。此外，WiscKey还利用SSD的RAID配置，进一步提高数据的可用性和耐久性。 WiscKey的设计考虑了SSD的耐用性和寿命。由于SSD的擦写次数有限，减少不必要的写入有助于延长其使用寿命。通过分离键和值，WiscKey能够在不牺牲太多性能的情况下，有效地管理SSD的写入操作，这对于大规模、高性能的键值存储系统尤其重要。在实际应用中，WiscKey可能适用于需要高效存储和检索大量键值对的场景，例如数据库系统、日志记录、缓存服务等。通过这种方式，即使在面对高并发写入负载时，也能保持较低的I/O成本和较高的系统响应速度。 WiscKey是一个针对SSD优化的LSM树结构的键值存储系统，通过键值分离策略，减少了写放大问题，提高了SSD的性能和寿命，为需要高速、低延迟存储解决方案的场景提供了新的选择。其设计思路和实施方法对于理解如何有效利用现代存储硬件，特别是SSD，来优化数据存储系统的性能具有重要的参考价值。

USENIX Association 14th USENIX Conference on File and Storage Technologies (FAST ’16) 135

ports range queries, snapshots, and other features that are

useful in modern applications. In this section, we brieﬂy

describe the core design of LevelDB.

The overall architecture of LevelDB is shown in Fig-

ure 1. The main data structures in LevelDB are an on-

disk log ﬁle, two in-memory sorted skiplists (memtable

and immutable memtable), and seven levels (L

to L

)

of on-disk Sorted String Table (SSTable) ﬁles. LevelDB

initially stores inserted key-value pairs in a log ﬁle and

the in-memory memtable. Once the memtable is full,

LevelDB switches to a new memtable and log ﬁle to

handle further inserts from the user. In the background,

the previous memtable is converted into an immutable

memtable, and a compaction thread then ﬂushes it to the

disk, generating a new SSTable ﬁle (about 2 MB usually)

at level 0 (L

); the previous log ﬁle is discarded.

The size of all ﬁles in each level is limited, and in-

creases by a factor of ten with the level number. For

example, the size limit of all ﬁles at L

is 10 MB, while

the limit of L

is 100 MB. To maintain the size limit,

once the total size of a level L

exceeds its limit, the

compaction thread will choose one ﬁle from L

, merge

sort with all the overlapped ﬁles of L

i+1

, and generate

new L

i+1

SSTable ﬁles. The compaction thread con-

tinues until all levels are within their size limits. Also,

during compaction, LevelDB ensures that all ﬁles in a

particular level, except L

, do not overlap in their key-

ranges; keys in ﬁles of L

can overlap with each other

since they are directly ﬂushed from memtable.

To serve a lookup operation, LevelDB searches the

memtable ﬁrst, immutable memtable next, and then ﬁles

to L

in order. The number of ﬁle searches required to

locate a random key is bounded by the maximum number

of levels, since keys do not overlap between ﬁles within

a single level, except in L

. Since ﬁles in L

can con-

tain overlapping keys, a lookup may search multiple ﬁles

at L

. To avoid a large lookup latency, LevelDB slows

down the foreground write trafﬁc if the number of ﬁles

at L

is bigger than eight, in order to wait for the com-

paction thread to compact some ﬁles from L

to L

2.3 Write and Read Ampliﬁcation

Write and read ampliﬁcation are major problems in

LSM-trees such as LevelDB. Write (read) ampliﬁcation

is deﬁned as the ratio between the amount of data writ-

ten to (read from) the underlying storage device and the

amount of data requested by the user. In this section, we

analyze the write and read ampliﬁcation in LevelDB.

To achieve mostly-sequential disk access, LevelDB

writes more data than necessary (although still sequen-

tially), i.e., LevelDB has high write ampliﬁcation. Since

the size limit of L

is 10 times that of L

i−1

, when merg-

ing a ﬁle from L

i−1

to L

during compaction, LevelDB

may read up to 10 ﬁles from L

in the worst case, and

100

1000

Amplification Ratio

3.1

1 GB

8.2

327

100 GB

Write Read

Figure 2: Write and Read Ampliﬁcation. This ﬁg-

ure shows the write ampliﬁcation and read ampliﬁcation of

LevelDB for two different database sizes, 1 GB and 100 GB.

Key size is 16 B and value size is 1 KB.

write back these ﬁles to L

after sorting. Therefore, the

write ampliﬁcation of moving a ﬁle across two levels can

be up to 10. For a large dataset, since any newly gen-

erated table ﬁle can eventually migrate from L

to L

through a series of compaction steps, write ampliﬁcation

can be over 50 (10 for each gap between L

to L

Read ampliﬁcation has been a major problem for

LSM-trees due to trade-offs made in the design. There

are two sources of read ampliﬁcation in LevelDB. First,

to lookup a key-value pair, LevelDB may need to check

multiple levels. In the worst case, LevelDB needs

to check eight ﬁles in L

, and one ﬁle for each of

the remaining six levels: a total of 14 ﬁles. Sec-

ond, to ﬁnd a key-value pair within a SSTable ﬁle,

LevelDB needs to read multiple metadata blocks within

the ﬁle. Speciﬁcally, the amount of data actually read

is given by

(index block + bloom-filter blocks +

data block)

. For example, to lookup a 1-KB key-value

pair, LevelDB needs to read a 16-KB index block, a 4-

KB bloom-ﬁlter block, and a 4-KB data block; in total,

24 KB. Therefore, considering the 14 SSTable ﬁles in

the worst case, the read ampliﬁcation of LevelDB is 24

× 14 = 336. Smaller key-value pairs will lead to an even

higher read ampliﬁcation.

To measure the amount of ampliﬁcation seen in prac-

tice with LevelDB, we perform the following experi-

ment. We ﬁrst load a database with 1-KB key-value

pairs, and then lookup 100,000 entries from the database;

we use two different database sizes for the initial load,

and choose keys randomly from a uniform distribution.

Figure 2 shows write ampliﬁcation during the load phase

and read ampliﬁcation during the lookup phase. For a 1-

GB database, write ampliﬁcation is 3.1, while for a 100-

GB database, write ampliﬁcation increases to 14. Read

ampliﬁcation follows the same trend: 8.2 for the 1-GB

database and 327 for the 100-GB database. The rea-

son write ampliﬁcation increases with database size is

straightforward. With more data inserted into a database,

the key-value pairs will more likely travel further along

剩余16页未读，继续阅读

-Kylin

粉丝: 2
资源: 3

WiscKey：优化SSD的LSM树Key-Value分离存储策略

cpp-WiscKey是基于leveddb改进kv的存储引擎

WiscKey - Separating Keys from Values.pdf

LSM-trie - An LSM-tree-based Ultra-Large Key-Value Store for Small Data - Slides (atc15_slides_wu)-计算机科学

leveled:纯净的Erlang KeyValue存储库-基于LSM树，针对HEAD请求进行了优化

基于Wisckey论文的LSM数据存储引擎源码+项目说明.zip

Deferred-Lightweight-Indexing-for-Log-Structured-Key-Value-Stores:HBase http上的二级索引

Garbage-First LSM 论文

SifrDB: A Unified Solution for Write-Optimized Key-Value Stores in Large Datacenter

matlab电磁波代码-LSM-electromagnetic:LSM-电磁

sistema-mensagens-lsm

最新资源