SSD RAID弹性奇偶校验技术提升存储可靠性

PDF格式 | 368KB | 更新于2024-08-27 | 157 浏览量 | 举报

本文档《SSD RAID阵列的弹性奇偶校验记录》是一篇发表于2016年6月的会议论文，主要探讨了在固态硬盘（SSD）组成的RAID阵列中实现弹性奇偶校验技术的问题。RAID（Redundant Array of Independent Disks）是一种数据存储技术，通过在多个硬盘上分散存储数据并添加冗余信息来提高数据安全性和性能。弹性奇偶校验是一种用于检测和纠正错误的方法，它能在数据读取时实时校验数据的完整性，这对于频繁写入和读取的SSD阵列尤其重要，因为SSD的耐久性相对于机械硬盘较差，且没有旋转部件，所以对数据保护的需求更为迫切。作者包括来自中国科学技术大学、国防科技大学协作创新中心以及香港中文大学计算机科学与工程系的研究人员。其中，Yongkun Li 和 Patrick P.C. Lee 是论文的主要作者，他们还参与了加密去重系统等相关的研究项目。李永坤教授在高性能计算领域有丰富的学术贡献，而李 Patrick P.C. 在计算机科学领域则有113篇论文和超过1900次引用，显示其深厚的专业背景和影响力。文章可能探讨了以下几个关键点： 1. 弹性奇偶校验算法在SSD RAID中的设计和优化：如何适应SSD的特性，如低延迟和高I/O速度，同时提供高效的错误检测和恢复机制。 2. SSD的耐久性对奇偶校验策略的影响：如何平衡数据保护和SSD寿命之间的关系，减少不必要的写操作对SSD的磨损。 3. 实际应用中的性能评估：通过实验或模拟，研究弹性奇偶校验在SSD RAID阵列中的性能表现，比如读写吞吐量、故障恢复时间等。 4. 技术挑战与解决方案：可能涉及如何处理SSD的闪存磨损问题，以及如何与现有RAID级别的兼容性问题。鉴于文档上传者Patrick P.C. Lee在2016年4月进行了文件更新，可以推测文中可能包含了最新的研究成果和对当时SSD技术发展的洞察。该论文对于理解现代数据中心和企业级存储系统的设计决策具有重要参考价值，特别是对于那些关注数据保护和性能优化的IT专业人士来说。

endurance. It further extends the original parity logging design

by allowing parity chunks to be computed based on the newly

written data chunks only, where the data chunks may span

within a partial stripe or across more than one stripe. Such an

“elastic” parity construction eliminates the need of pre-reading

old data for parity computation, so as to improve performance.

To summarize, this paper makes the following contributions:

• We design and implement EPLOG as a user-level

block device

that manages an SSD RAID array.

Speciﬁcally, EPLOG uses hard-disk drives (HDDs)

to temporarily log parity information, and regularly

commits the latest parity updates to SSDs to mitigate

the performance overhead due to HDDs. We show

that EPLOG enhances existing ﬂash-aware SSD RAID

(see Section VI) in different ways: (i) EPLOG is

fully compatible with commodity conﬁgurations and

does not rely on high-cost components such as non-

volatile RAM (NVRAM); and (ii) EPLOG can readily

support general erasure coding schemes for high fault

tolerance.

• We conduct mathematical analysis on the system reli-

ability in terms of mean-time-to-data-loss (MTTDL).

We show that EPLOG improves the system reliability

over the conventional RAID design when SSDs and

HDDs have comparable failure rates [48].

• We conduct extensive trace-driven testbed experi-

ments, and demonstrate the endurance and perfor-

mance gains of EPLOG in mitigating parity update

overheads. We compare EPLOG with the Linux soft-

ware RAID implementation based on mdadm [37],

which is commonly used for managing software RAID

across multiple devices. For example, in some settings,

EPLOG reduces the total write trafﬁc to SSDs by

45.6-54.9%, reduces the number of GC requests by

77.1-97.6%, and increases the I/O throughput by 30.1-

119.2% even though it uses HDDs for parity logging.

Finally, EPLOG shows higher throughput than the

original parity logging design, and incurs low over-

head in metadata management.

The rest of the paper proceeds as follows. In Section II,

we state our design goals and motivate our new elastic parity

logging design. In Section III, we describe the design and

implementation details of EPLOG. In Section IV, we analyze

the system reliability of EPLOG. In Section V, we present

evaluation results on our EPLOG prototype through trace-

driven testbed experiments. In Section VI, we review related

work, and ﬁnally in Section VII, we conclude the paper.

II. OVERVIEW

In this section, we state the design goals of EPLOG. We

also motivate how EPLOG mitigates parity update overhead

through elastic parity logging.

A. Goals

EPLOG aims for four design goals.

Here, a block refers to the read/write unit at the system level, and should

not be confused with an SSD block at the ﬂash level.

• General reliability: EPLOG provides fault tolerance

against SSD failures. In particular, it can tolerate

a general number of SSD failures through erasure

coding. This differs from many existing SSD RAID

designs that are speciﬁc for RAID-5 (see Section VI).

• High endurance: Since parity updates introduce extra

writes to SSDs, EPLOG aims to reduce the parity traf-

ﬁc caused by small (or partial-stripe) writes to SSDs,

thereby improving the endurance of SSD RAID.

• High performance: EPLOG eliminates the extra

I/Os due to parity updates, thereby maintaining high

read/write performance.

• Low-cost deployment: EPLOG is deployable using

commodity hardware, and does not assume high-

end components such as NVRAM as in SSD RAID

designs (e.g., [10], [15], [26]).

EPLOG targets workloads that are dominated by small

random writes, leading to frequent partial-stripe writes to

RAID. Examples of such workloads include those in database

applications [17], [27] and enterprise servers [20]. Note that

real-world workloads often exhibit high locality both spatially

and temporally [34], [43], [46], such that recently updated

chunks and their nearby chunks tend to be updated more

frequently. It is thus possible to exploit caching to batch-

process chunks in memory to boost both endurance and

performance (by reducing write trafﬁc to SSDs). On the other

hand, modern storage systems also tend to force synchronous

writes through fsync/sync operations [14], which make

small random writes inevitable. Thus, our baseline design

should address synchronous small random writes, but allows

an optional caching feature for potential performance gains.

B. Elastic Parity Logging

Parity logging [47] has been a well-studied solution in

traditional RAID to mitigate the parity update overhead. We

ﬁrst review the design of parity logging, and then motivate

how we extend its design in the context of SSD RAID.

We ﬁrst demonstrate how parity logging can improve

endurance of an SSD RAID array by limiting parity trafﬁc to

SSDs. Our idea is to add separate log devices to keep track of

parity information that we refer to as log chunks. To illustrate,

Figure 1 shows an SSD RAID-5 array with three SSDs for

data and one SSD for parity (i.e., the array can tolerate single

SSD failure). In addition, we have one log device for storing

log chunks. Suppose that a stream of write requests is issued

to the array. The ﬁrst two write requests, respectively with data

chunks {A0, B0, C0} and {A1, B1, C1}, constitute two stripes.

Also, the following write request updates data chunks B0,

C0, and A1 to B0’, C0’, and A1’, respectively. Figure 1(a)

illustrates how the original parity logging works. It updates

data chunks in-place at the system level above the SSDs (note

that an SSD adopts out-of-place updates at the ﬂash level as

described in Section I-A). It computes a log chunk by XOR-

ing the old and new data chunks on a per-stripe basis. It then

appends all log chunks to the log device.

The original parity logging limits parity trafﬁc to SSDs,

thereby slowing down their wearing rates. Nevertheless, we

剩余12页未读，继续阅读

weixin_38581992

粉丝: 3

SSD RAID弹性奇偶校验技术提升存储可靠性

可识别工作负载的弹性条带，并具有用于SSD RAID阵列的热数据识别功能

服务器应该使用哪种类型的RAID阵列.doc

Grouping-based Elastic Striping with Hotness Awareness for Improving SSD RAID Performance

RAID磁盘阵列

热点感知分组弹性条带化提升SSD RAID性能

工作负载感知的弹性条带技术：热数据识别提升SSD RAID性能

独立SSD冗余阵列与文件系统的实验研究

选择服务器RAID阵列类型：全面指南

hpt37X系列Windows RAID阵列驱动详解

优化SSD冗余阵列的动态条带技术：CD-RAIS

最新资源