工作负载感知的弹性条带技术：热数据识别提升SSD RAID性能

90 浏览量更新于2024-08-27 1 收藏 1.46MB PDF 举报

本文档深入探讨了"Workload-Aware Elastic Striping with Hot Data Identification for SSD RAID Arrays"这一主题，发表在2017年的《计算机辅助设计集成电路系统》(IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)期刊第36卷第5期。随着固态硬盘(Solid State Drives, SSDs)在存储领域的广泛应用，为了提高其设备级别的故障容忍度，RAID (Redundant Array of Independent Disks) 技术被广泛采用。然而，传统的RAID，如奇偶校验更新，无论是通过读取-修改-写入（Read-Modify-Write）还是读取-重构-写入（Read-Reconstruct-Write）的方式，都会带来额外的I/O操作，显著降低SSD RAID的性能。弹性条带化(Elastic Stripping)作为一种解决方案，旨在减少奇偶校验更新的成本。它选择只重新构建包含新更新数据块的新条带，而非直接更新奇偶校验块，从而降低了数据处理的复杂性。然而，这一过程通常伴随着RAID级别垃圾收集（Garbage Collection, GC），其中由于既有热点数据又有冷数据的存在，可能导致较高的成本和性能损失。本文作者李永坤、沈彪彪、潘玉彪、徐银龙、李志鹏和刘约翰（IEEE会士）针对这一问题提出了工作负载感知方案(Workload-Aware Scheme, WAS)，目标是优化RAID级别的GC成本，从而提升SSD RAID的性能和耐用性。他们首先开发了一种新的方法，能够识别工作负载特性，区分热数据和冷数据，这有助于更有效地管理数据分布和RAID操作，减少不必要的计算资源消耗。 WAS通过实时监控工作负载变化，动态调整条带化策略，例如在处理热点数据时，可以优先保留或重构，而冷数据则可以在适当的时候进行清理。这种方法不仅减少了RAID级别的GC活动，还能够适应不同应用对性能的需求，提高了整个系统的整体效率。这篇研究论文为优化固态硬盘RAID阵列的性能和持久性提供了创新的思考和实践路径，对于现代数据中心和高性能计算环境具有重要的实际应用价值。

LI et al.: WORKLOAD-AWARE ELASTIC STRIPING WITH HOT DATA IDENTIFICATION FOR SSD RAID ARRAYS 817

Fig. 2. Elastic striping: P

= D

∗

⊕ D

∗

⊕ D

∗

However, parity update in an RAID will introduce extra

I/Os and degrade both the performance and endurance of SSD

RAID. To address this issue, three types of RAID schemes

are developed to reduce the number of I/Os caused by parity

update, e.g., parity logging, parity caching, and elastic strip-

ing. Speciﬁcally, parity logging [29] usually uses a dedicated

device to log writes so as to delay parity update and reduce

the amount of I/Os, and it has also been used for deploying

RAID for SSDs. For example, Mao et al. [20] proposed a

design for RAID-4, which uses one HDD as the parity device

to absorb parity writes and uses another HDD as a mirror to

absorb small write requests, and similar idea was also designed

for RAID-6 in [36]. Li et al. [17] proposed EPLog which

extends parity logging with an elastic feature and uses HDDs

as log devices to absorb writes so as to reduce the writes to

SSDs. Differently, parity caching [4], [10], [14], [16]usesa

buffer, e.g., nonvolatile memory, to delay parity updates so

as to reduce the writes to SSDs. At last, elastic striping [12]

chooses to construct new stripes with updated new data chunks

instead of immediately updating the parity chunks in the old

stripe. Considering the good feature of requiring no addi-

tional devices, we focus on the scheme of elastic striping in

this paper. In the following of this section, we ﬁrst review

how elastic striping works, and then discuss its problem and

motivate our design.

B. Elastic Striping

Elastic striping was ﬁrst proposed for chip-level RAID in

single SSDs. Its main idea is to reconstruct new stripes with the

newly updated data instead of immediately updating the parity

chunks in the old stripe, so it requires no additional devices

such like nonvolatile memories. Fig. 2 shows an example

which illustrates the scheme.

Suppose that there are six data chunks D

–D

and two parity

chunks P

and P

in an SSD RAID at the beginning, and the

incoming requests are: 1) updating D

to D

∗

; 2) updating D

to D

∗

; and 3) updating D

to D

∗

. We assume that the three

update requests arrive sequentially.

Instead of immediately updating D

, D

, and D

and their

corresponding parity chunks, elastic striping manages write

requests in a log-structured manner. Precisely, it appends the

updated data D

∗

, D

∗

, and D

∗

into the RAID array by construct-

ing a new stripe without performing update to the old stripes.

Note that D

, D

, and D

are out-of-date, but still need to

be kept in SSDs for data protection. For space consideration,

elastic striping marks these chunks as invalid at RAID level

Fig. 3. RAID-level GC. (a) Before GC operation. (b) After GC operation.

and calls GC to reclaim the space occupied by invalid chunks

in future.

The GC works as follows. When it is triggered, it selects a

GC unit, which represents the smallest unit for GC and can

be a multiple of stripes, according to a GC algorithm, then

writes back all the valid data chunks in the selected GC unit

by reconstructing new stripes, and ﬁnally releases the space

of the GC unit for future allocation. We call this GC process

RAID-level GC so as to differentiate the GC process inside

single ﬂash chips. We further deﬁne the average number of

valid data chunks that need to be written back during each

GC operation as RAID-level GC cost.

To further illustrate the RAID-level GC process, we consider

an example shown in Fig. 3, where a GC unit consists of two

stripes. As shown in the example, there are three valid data

chunks, D

, D

, and D

, in the selected GC unit in Fig. 3(a).

An RAID-level GC operation ﬁrst reads D

, D

, and D

, then

writes them into free places, and reclaims those invalid chunks

for future allocation as shown in Fig. 3(b).

C. Motivation

We note that skewness and temporal locality exist in real-

world I/O workloads [6], [15], [28]. Skewness indicates that

some data are accessed and updated frequently while others

are updated rarely. Temporal locality means if a data chunk is

accessed at present, it will be accessed with a high probability

in the near future. In particular, many workloads of real-world

applications exhibit the characteristic that 80% of accesses are

directed to only 20% of data, which is the so called “80/20

Rule” [6], [23]. Our analysis of real-world workloads also

validates this property (see Table I). As a result, the mixture

of hot and cold data in SSD RAID will potentially increase

the number of chunk rewrites per RAID-level GC operation,

and ﬁnally aggravates the RAID-level GC cost. In order to

explain this, we present a simple numerical analysis by using

the example in Fig. 4.

Now we compare the RAID-level GC cost in two hypothetic

cases so as to study the impact of workload-awareness on elas-

tic striping: 1) hot and cold data are evenly mixed together [see

Fig. 4(a)] and 2) hot and cold data are perfectly separated [see

Fig. 4(b)]. In this example, we consider an 80/20 Rule sce-

nario, in which 20% of data in an SSD RAID is hot and

receives 80% of write requests, and the remaining 80% of

data is cold and receives 20% of writes, and use the num-

ber of write accesses to measure hotness so as to ease the

剩余13页未读，继续阅读

weixin_38682518

粉丝: 3
资源: 935

工作负载感知的弹性条带技术：热数据识别提升SSD RAID性能

Linux阵列 RAID详解

行业文档-设计装置-一种PCIE+SSD阵列的数据读方法、系统及读写方法.zip

RAID磁盘阵列

服务器应该使用哪种类型的RAID阵列.doc

DCS5：用于增强基于SSD的RAID-5系统的耐用性的对角编码方案

行业分类-设备装置-一种利用SSD的无效数据优化RAID56写性能的方法.zip

热点感知分组弹性条带化提升SSD RAID性能

优化SSD RAID阵列一致性：CG-Resync同步方案

DCS：提升SSD-RAID阵列耐久性的对角编码策略

独立SSD冗余阵列与文件系统的实验研究

最新资源