热点感知分组弹性条带化提升SSD RAID性能

93 浏览量更新于2024-08-28 1 收藏 526KB PDF 举报

"基于热点感知的分组弹性条带化，以提升SSD RAID性能的研究论文" 本文探讨了如何通过一种新的方法来优化固态硬盘（SSD）RAID（冗余磁盘阵列）的性能，该方法被称为“基于热点感知的分组弹性条带化”。传统的RAID在提供设备级容错功能时，通常采用读-修改-写或读-重建-写的方式更新奇偶校验块，这种操作会导致大量额外的I/O操作，从而显著降低SSD RAID的性能。最近提出的弹性条带化方案尝试用更新的新数据块重建新条带，而不更新旧的奇偶校验块，以此减少不必要的I/O操作。然而，这种方法仍然需要在RAID级别进行垃圾回收，这可能会带来很高的成本。为解决这个问题，作者提出了一个基于热点感知的缓存策略，用于缓冲即将到来的写入操作，并根据数据块的热度值将它们分类到多个缓冲区组中。 “热点”是指在系统中访问频率较高的数据，对这些高访问频率的数据进行特殊处理可以提高整体性能。在这个热点感知的缓存策略中，不同热度级别的数据块被分组到不同的缓冲区，这样可以更有效地管理和调度I/O操作。随后，他们提出了一种分组式的弹性条带化方案，将不同组内的数据块分别写入到不同的存储区域，以此避免因频繁的垃圾回收操作导致的性能下降。这种分组弹性条带化策略的核心思想是根据数据的热度和访问模式优化写入操作，使得热门数据能够在写入时优先考虑，从而减少了I/O延迟和提高了系统的整体响应速度。同时，通过合理地组织和更新数据，可以降低对RAID级别的垃圾收集需求，进一步提升SSD RAID的性能。此外，文章可能还深入分析了该方法的实现细节，包括如何评估和更新数据块的热度值、如何动态调整缓冲区大小以及如何在不同组间平衡负载等。可能还涉及了性能评估，通过实验对比传统的RAID技术和现有的弹性条带化技术，证明了所提出的策略在实际应用中的优越性。这篇研究论文提供了一个创新的解决方案，旨在利用热点数据的特性提高SSD RAID的性能，通过智能缓存管理和分组弹性条带化策略，有效减少了不必要的I/O操作，降低了系统开销，提升了固态硬盘阵列的效率。

SSD 0

Valid chunkInvalid chunk

SSD 1 SSD 2 SSD 3

SSD 0 SSD 1 SSD 2 SSD 3



(a) (b) (c)



Log

device

SSD 0 SSD 1 SSD 2 SSD 3

Log

device

Incoming requests: 1. Update 2. Update 3. Update

Before

After

SSD 0 SSD 1 SSD 2 SSD 3

Cache

Note that: Note that: Note that:

1 1 1

L D D !



2 2 2

D D !



4 4 4

D D !



0 0 1 2

P P L L ! !



1 1 4

P P L !



0 0 1 2

P D D D ! !



1 3 5 4

P D D D ! !



2 1 2 4

P D D D ! !



Fig. 2: Three categories of enhanced RAID schemes: (a) Parity Logging; (b) Parity Caching; (c) Elastic Striping

a round-robin manner for load balance. When data chunks are

updated, their corresponding parity chunks will be updated as

well via either read-modify-write or read-reconstruction-write.

We let the chunk size be equal to the page size in this paper.

C. Enhanced RAID Schemes

As we discussed in

I, parity update in a RAID will

introduce extra I/Os and degrade both the performance and en-

durance of SSD RAID. Various RAID schemes are developed

to reduce the I/Os caused by parity update, and we classify

them into three categories: parity logging [29, 23, 36], parity

caching [7, 13, 18, 20], and elastic striping [16].

Fig. 2 illustrates the three schemes. Suppose that there are

six data chunks D

, D

and two parity chunks

, P

stored in an SSD RAID array at the beginning, and the

incoming requests are: (1) updating D

to D

∗

, (2) updating

to D

∗

, (3) updating D

to D

∗

. We assume that the three

update requests arrive sequentially.

Fig. 2(a) illustrates the parity logging scheme. It usually

employs a dedicated device to log data writes, e.g., uses an

HDD as the log device to absorb small writes. With parity

logging, D

is updated to D

∗

in SSD 1, and a delta, L

which is computed via XOR-ing D

with D

∗

, is logged on

the log device. Then D

is updated to D

∗

in SSD 2, and

is also logged. When serving the last request, it writes

∗

into SSD 1, and writes L

into HDD. At last, it uses the

deﬁned computation denoted in Fig. 2(a) to update P

to P

∗

and P

to P

∗

in SSD 3 and SSD 2, respectively. We note that

the number of parity writes gets reduced, so as the writes to

SSDs. However, parity logging requires a dedicated device to

log small writes, and it may also harms system-level wear-

leveling.

Fig. 2(b) shows an example of the parity caching scheme.

Parity caching uses a buffer to cache all incoming writes so to

delay parity updates. The key idea is that if more chunks are

updated together, the chance of constructing full-stripe writes

becomes larger, so the I/Os caused by parity update can be

reduced. As shown in Fig. 2(b), it buffers D

∗

and D

∗

in the

cache ﬁrst, and then ﬂushes them to SSDs and updates P

to P

∗

. This process will also happen when updating D

and

. Parity caching can also reduce parity writes, but it usually

requires a dedicated NVRAM, which is expensive and still not

mature in the current market. Note that D

, D

, P

and P

in both Fig. 2(a) and Fig. 2(b) are marked invalid at the device

level after data update and they can be reclaimed automatically

by the corresponding SSDs.

As shown in Fig. 2(c), elastic striping manages write

requests in a log-structured manner. It appends D

∗

, D

∗

and D

∗

into the RAID array for constructing a new stripe instead of

updating them in the original devices. Note that D

, D

and D

are out-of-date but are still needed to be kept in SSDs for data

protection, and this consumes extra storage space. Therefore,

elastic striping marks these chunks as invalid at RAID level and

calls RAID-level GC operation to reclaim the space occupied

by these invalid chunks. This scheme can effectively reduce

parity writes, mitigate the performance degradation caused by

RMW and RRW, and beneﬁt the wear-leveling among SSDs.

Moreover, elastic striping seems to be suitable for SSD RAID

as out-of-place overwrite is usually adopted by SSDs. The

primary issue of this scheme is that RAID-level GC cost may

be very high if invalid chunks are scattered over the stripes

in the whole RAID array. This motivates us to develop a

workload-aware scheme to reduce the RAID-level GC cost so

to improve the SSD RAID performance. We further discuss

162162

剩余11页未读，继续阅读

weixin_38663169

粉丝: 2

热点感知分组弹性条带化提升SSD RAID性能

MXNet-G框架发布：Grouping-SGD算法与GSP并行化方案

Java开源组件lucene-grouping-6.6.0中英对照API文档

Lucene Grouping 7.7.0 API 中文版资源包介绍

Pareto-based grouping discrete harmony search algorithm for multi-objective flexible job shop scheduling

lucene-grouping-4.4.0

peak-grouping-alignment

lucene-grouping-8.3.0-sources.jar

lucene-grouping-7.7.0-API文档-中文版.zip

lucene-grouping-7.3.1-API文档-中文版.zip

lucene-grouping-7.2.1-API文档-中文版.zip

最新资源