SAL-Hashing:自适应线性散列优化固态硬盘性能

135 浏览量更新于2024-08-26 收藏 1.28MB PDF 举报

固态驱动器（SSD）因其高性能和低功耗特性，已经成为取代传统机械硬盘的首选。然而，SSD的随机写入性能远低于读取，这限制了其在数据存储和处理中的优势。传统的索引结构，如针对磁盘设计的对称I/O结构，无法充分利用SSD的特性。本文提出了一个名为自适应线性散列（SAL-Hashing）的优化方案，旨在解决这一问题。 SAL-Hashing的主要创新在于以下几个方面： 1. 桶的组织优化：将存储桶划分为组和集合，通过粗粒度写入和延迟分割策略，减少了对哈希结构频繁的小规模随机写入。组由固定数量的桶组成，而集合则包含多个组，这种层次结构有助于减少不必要的写入操作。 2. 日志区和Bloom过滤器的应用：在每个集合旁边添加了日志区域，通过批量提交更新到日志中，实现了读写操作的成本分摊。同时，使用Bloom过滤器来索引更新日志，提高了搜索效率，降低了查找成本。 3. 自适应合并策略：设计了一种基于成本的在线算法，当某个集合的搜索需求增加时，会自动将日志区域与之合并，从而动态调整以适应不同工作负载下的性能需求。 4. 利用内部并行性：通过粗粒度写入，结合合并或拆分操作，使得SSD能够有效地利用其内部的包级并行性，提升数据传输带宽。实验结果显示，SAL-Hashing具有良好的自适应性，可以根据访问模式的变化动态调整策略，无论在单一还是多任务场景下，都能有效提高SSD的工作效率。在实际应用中，这一优化方案对于那些对随机写入敏感、追求高性能的场景尤其适用，如数据库系统、实时分析等。 SAL-Hashing作为一种针对SSD特性的新型线性散列索引结构，通过精细的设计和优化，成功地降低了索引操作对SSD的随机写入负担，提升了整体的性能和能源效率。这对于推动SSD在现代数据存储系统中的广泛应用具有重要意义。

the storage allocation for buckets and overflow buckets, and so

on. A bucket page and one or more overflow pages (if any)

form the storage space for a bucket. The first page of a bucket

is the bucket page and other pages of the bucket are the

overflow pages. The bucket page and overflow pages in a

bucket are connected by a bi-directional link. Besides, the

bitmap page monitors the usage of overflow pages and the

bitmap page itself.

In the disk-oriented linear hashing, the bucket pages are

allocated in batch, and the number of each allocation is power

of 2. It means that we always first pre-allocate

2 ( 1)

i ≥

bucket

pages each time, which are contiguously stored on the disk,

and then allocate some overflow pages behind these bucket

pages according to the requirements. For example, initially we

allocate bucket pages for bucket 0 and 1, and then when need

to allocate bucket 2, we allocate bucket pages for bucket 2 and

3. Analogously, when need to allocate bucket 4, we allocate

bucket pages for bucket 4-7, and so on. In addition, every time

before we pre-allocate bucket pages, we record a total count of

overflow pages and bitmap pages allocated by far in an array

called hashm_spare, which is stored in the metadata page. By

this way, using Formula (1), we can easily locate a bucket in an

index file through the bucket number (denoted as

). Figure 2

gives an example of the linear hashing’s storage allocation and

management.

1 _ log ( 1

)1(0)

n nn

b hashm spare b b

block numb



++ + − ≠

















)

B. Flash-Oriented Hash Indexes

In recent years, there are some studies focusing on

optimizing hash indexes on flash memory. MicroHash [11] is

an efficient external memory index structure for Wireless

Sensor Devices (WSDs), which stores data on flash by time or

value. However, the MicroHash can only be applied in an

architecture of a wireless sensor which performs the data

manipulations directly to raw NAND flash. MLDH [3] is a

multi-layered hash index, and the capacity of each level is

twice as its upper level. Updates to the MLDH is first buffered

in an in-memory index structure named

Mem

. When

Mem

is full,

it is merged with hash indexes on flash to produce a new hash

index. Though the MLDH has

( log )ON N

construction time, it

can reduce flash writes if top levels of hash indexes are

buffered in main memory. Wang et al. [5] proposed a flash-

based self-adaptive extendible hash index in which each bucket

occupies a block (erase unit) of flash. A bucket, or a block,

consists of both data region and log region, which is similar to

the IPL storage model. The log region serves update and delete

operations, so in-place updates to the data region can be

delayed. In addition, a Split-or-Merge (SM) factor, which is

dynamically adjusted according to the log/data ratio, is added

to make the index self-adaptive. Hybrid Hash Index [6, 35]

delays split operations which cause additional writes and erase

operations by using overflow buckets. If the number of

overflow buckets attached to a bucket has reached the ceiling,

the Hybrid Hash Index determines whether a split operation is

triggered according to the ratio of update and delete records in

that bucket. Li et al. [1] proposed a Lazy-Split schema to

reduce writes at the expense of more reads. They declared that

the Lazy-split hash can adapt itself dynamically to different

search ratios by tuning the Split-cursor, but they did not offer

an effective method of tuning it.

In summary, most of the previous work focuses on

improving the update efficiency of hash indexes but neglecting

the degradation of the search performance. On the contrary, we

propose to optimize both update and search performance by

dynamically adapting the hash index structure according to the

change of access patterns. In Section V, we will compare our

proposal with the linear hashing and the Lazy-Split hash index

to demonstrate the efficiency of our proposal.

III. S

ELF-ADAPTIVE LINEAR HASHING

TABLE I.

OTATIONS USED IN THIS PAPER

Notation

Description

Average cost of writing a flash page

w/n

Average cost of

a coarse-grained flash-page write

Current load factor

Threshold (u

pper bound) of the load factor

Capac

ity of a bucket page

Count of indexed keys

Count of

buckets

Count

of fixed buckets contained in a group

Page size (bytes)

Record size (bytes)

Log buffer size (pages)

alse positive probability of Bloom filters

Current count

of log pages in the log region

Max

count of log pages in the log region

bucket

The

jth Bucket

group

The

ith group

,kc

set

The

kth set, c is the count of bits used to generate the set

Bit(x)

Function to c

ompute the bit number of the binary value of x

h(x, j)

Function to g

et the last j bits of x

h(K)

unction to get the hash value of key K, and h(K) < b

In this section, we present the design of SAL-hashing. SAL-

hashing aims to reduce small random writes while maintaining

high search performance. The notations used throughout the

paper are summarized in Table I.

435

剩余11页未读，继续阅读

weixin_38499950

粉丝: 4
资源: 941

SAL-Hashing:自适应线性散列优化固态硬盘性能

基于ARM的温度控制系统设计与研究.pdf

TI-TUSB1004：USB3.2四通道线性重驱动器

STM32单片机屏幕驱动与汽车电子：实现智能驾驶与车载娱乐，打造未来出行体验

STM32单片机继电器控制：嵌入式系统中的应用，打造智能化控制方案

鸿蒙开发前端学习笔记day02（自用）

A query builder for PostgreSQL.zip

【JCR一区级】Matlab实现侏儒猫鼬优化算法DMO-DBN实现轴承故障分类算法研究.rar

芯片参考资料-7448.zip

快速傅里叶变换、粉末光谱密度和交叉光谱密度Matlab代码.rar

基于微信小程序开发的代办事项.zip

最新资源