开放通道SSD上的LSM-Tree键值存储优化设计

需积分: 8 8 浏览量更新于2024-08-05 收藏 4.66MB PDF 举报

"Open-Channel SSD on LSM.pdf" 这篇论文探讨了在开放通道固态硬盘（Open-Channel SSD）上实现基于LSM-Tree（Log-Structured Merge Tree）的键值存储（Key-Value Store）的高效设计与实现。LSM-Tree是一种常见的用于非易失性存储的数据结构，它通过消除随机写入并保持良好的读取性能，特别适合大数据和互联网服务的数据管理。首先，我们需要理解LSM-Tree的基本原理。LSM-Tree将数据分批写入磁盘，通过合并操作（Merge）将小的写入操作合并成大的顺序写入，从而减少对闪存的随机访问，因为顺序写入在SSD上比随机写入更有效率。它通常由内存中的数据结构（例如内存表和压缩块）和磁盘上的数据结构（如SSTables）组成。当内存表满时，数据会被写入到磁盘，并对磁盘上的数据进行排序和合并。然而，传统的SSD被设计为黑盒设备，操作系统和文件系统看不到内部的物理块管理和写入放大问题。而Open-Channel SSD则打破了这种模式，它允许应用程序直接控制闪存的读写操作，提供了更高的灵活性和性能潜力。论文中提到，这种直接控制可以更好地利用SSD的特性，例如减少不必要的写入放大，优化垃圾回收（Garbage Collection）过程，以及更有效地利用SSD的带宽。论文作者们提出了一种新的设计方案，该方案针对Open-Channel SSD的特点，优化了LSM-Tree的结构和操作。他们可能包括以下关键点： 1. **块分配策略**：由于Open-Channel SSD允许直接管理块，因此可以设计更有效的分配策略来减少写入放大，比如预分配空间以避免频繁的块迁移。 2. **读取优化**：由于没有内置的缓存机制，读取操作可能需要更多的优化，例如通过维护更有效的索引结构来加速查找。 3. **写入合并**：在Open-Channel SSD上，可以更精细地控制写入合并，使得顺序写入更加高效。 4. **垃圾回收**：由于对硬件的直接访问，可以设计更高效的垃圾回收策略，减少不必要的读写操作，提高SSD的寿命。 5. **性能监控与调优**：通过直接访问SSD的统计信息，可以实时监控SSD的状态，动态调整工作负载以适应SSD的性能特征。论文还可能涉及实际系统实现的细节，包括如何处理错误恢复、并发控制以及在大规模部署中的扩展性问题。通过这样的设计，论文的目标是实现一个在Open-Channel SSD上运行的LSM-Tree键值存储系统，它能够充分利用SSD的性能优势，同时减少由传统SSD设计带来的性能瓶颈。这篇论文是关于如何在开放通道SSD上构建高效、低延迟的LSM-Tree键值存储的深度研究，对于理解如何优化非易失性存储设备上的数据管理系统具有重要的理论和实践价值。通过这种方式，可以为大数据中心提供更高性能和更低延迟的存储解决方案。

2.1 LevelDB

LevelDB is an open source key-value store that originated

from Google’s BigTable [18]. It is an implementation of

LSM-tree, and it has received increased attention in both

industry and academia [6][34][2]. Figure 1 illustrates the

architecture of LevelDB, which consists of two MemTables

in main memory and a set of SSTables [18] in the disk and

other auxiliary ﬁles, such as the Manifest ﬁle which stores

the metadata of SSTables.

MemTable

Write

Immutable

MemTable

Memory

Disk

Dump

……

…

Level 0

Level 1

10MB

Level 2

100 MB

Compaction

Log

Manifest

Current

SSTable

Figure 1. Illustration of the LevelDB architecture.

When the user inserts a key-value pair into LevelDB, it

will be ﬁrst saved in a log ﬁle. Then it is inserted into a sorted

structure in memory, called MemTable, which holds the

most recent updates. When the size of incoming data items

reaches its full capacity, the MemTable will be transformed

into a read-only Immutable MemTable. A new MemTable

will be created to accumulate fresh updates. At the same

time, a background thread begins to dump the Immutable

MemTable into the disk and generate a new Sorted String

Table ﬁle (SSTable). Deletes are a special case of update

wherein a deletion marker is stored.

An SSTable stores a sequence of data items sorted by

their keys. The set of SSTables are organized into a series

of levels, as shown in Figure 1. The youngest level, Level 0,

is produced by writing the Immutable MemTable from main

memory to the disk. Thus SSTables in Level 0 could contain

overlapping keys. However, in other levels the key range of

SSTables are non-overlapping. Each level has a limit on the

maximum number of SSTables, or equivalently, on the total

amount of data because each SSTable has a ﬁxed size in a

level. The limit grows at an exponential rate with the level

number. For example, the maximum amount of data in Level

1 will not exceed 10 MB, and it will not exceed 100 MB for

Level 2.

In order to keep the stored data in an optimized layout,

a compaction process will be conducted. The background

compaction thread will monitor the SSTable ﬁles. When the

total size of Level L exceeds its limit, the compaction thread

will pick one SSTable from Level L and all overlapping

ones from the next Level L+1. These ﬁles are used as inputs

to the compaction and are merged together to produce a

series of new Level L+1 ﬁles. When the output ﬁle has

reached the predeﬁned size (2 MB by default), another new

SSTable is created. All inputs will be discarded after the

compaction. Note that the compaction from Level 0 to Level

1 is treated differently than those between other levels. When

the number of SSTables in Level 0 exceeds an upper limit

(4 by default), the compaction is triggered. The compaction

may involve more than one Level 0 ﬁle in case some of them

overlap with each other.

By conducting compaction, LevelDB eliminates over-

written values and drops deleted markers. The compaction

operation also ensures that the freshest data reside in the

lowest level. The stale data will gradually move to the higher

levels.

The data retrieving, or read operation, is more compli-

cated than the insertion. When LevelDB receives a Get(Key,

Value) request, it will ﬁrst do a look up in the MemTable,

then in Immutable MemTable, and ﬁnally search the SSTa-

bles from Level 0 to higher levels in the order until a matched

KV data item is found. Once LevelDB ﬁnds the key in a cer-

tain level, it will stop its search. As we mentioned before,

lower levels contain fresher data items. The new data will be

searched earlier than old data. Similar to compaction, more

than one Level 0 ﬁle could be searched because of their data

overlapping. A Bloom ﬁlter [14] is usually adopted to re-

duce the I/O cost for reading data blocks that do not contain

requested KV items.

2.2 Open-Channel SSD

The open-channel SSD we used in this work, SDF, is a

customized SSD widely deployed in Baidu’s storage infras-

tructure to support various Internet-scale services [33]. Cur-

rently more than 3 000 SDFs have been deployed in the pro-

duction systems. In SDF, the hardware exposes its internal

channels to the applications through a customized controller.

Additionally, it enforces large-granularity access and pro-

vides lightweight primitive functions through a simpliﬁed

I/O stack.

The SDF device contains 44 independent channels. Each

ﬂash channel has a dedicated channel engine to provide

FTL functionalities, including block-level address mapping,

dynamic wear leveling, bad block management, as well as

the logic for the ﬂash data path. From an abstract view of

software layer, the SDF exhibits the following features.

First, SDF exposes the internal parallelism of SSD to user

applications. As mentioned previously, each channel of an

SDF has its exclusive data control engine. In contrast to the

conventional SSD, where the entire device is considered as

a single block device (e.g., /dev/sda), SDF presents each

channel as an independent device to the applications (e.g.,

from /dev/ssd0 to /dev/ssd43). With the capability of

directly accessing individual ﬂash channels on SDF, the user

剩余13页未读，继续阅读

我是新星

粉丝: 0
资源: 2

开放通道SSD上的LSM-Tree键值存储优化设计

Python库文件lsm-0.3.8-cp37-cp37m-win_amd64.whl解压指南

CentOS7环境Docker离线安装包及关键rpm文件

经纬度计算与转换小工具lsm.zip

使用说明 LD-K-.AK./LSM-...-LS, LD-K-.AK./LSH-...-LS[手册].pdf

使用说明 LD-K-.AK./LSH(LSM)-...-.S[手册].pdf

WiscKey - Separating Keys from Values.pdf

XiIinx首批Artix-7 FPGA正式出货.pdf

HBase数据读取流程解析-3——scan查询.pdf

Part2_UASMaster中DTM-DSM提取与编辑.pdf

dgraph-linux-amd64.tar.gz

最新资源