基于I/O重定向的高效Erasure-Coded存储集群重构策略

195 浏览量更新于2024-08-26 收藏 1.6MB PDF 举报

本文主要探讨了在线擦除编码存储集群中遇到的一个关键问题：用户I/O请求与重建I/O请求之间存在的I/O干扰，特别是在争夺磁盘和网络带宽资源时。针对这一挑战，研究者提出了一种名为“RAM-RS”的高效I/O重定向重构方案。 RAM-RS的核心思想是利用冗余在生存节点预先分配的主内存中构建一个RS（Reed-Solomon）编码区域。当用户读写请求的目标节点发生故障时，这些请求被重定向到这个RAM-RS区域。这样做的优势在于，由于RS编码的特性，该区域可以快速地处理用户的读写请求，从而减轻了重建节点的压力，减少了重建过程中对磁盘和网络带宽的需求。首先，RAM-RS通过缓冲区机制将未能写入失败节点的数据暂存于内存中，避免了数据丢失并减小了重建过程的数据量。其次，对于读取请求，如果目标块已损坏，存活节点可以利用其自身的完好数据来协助重建，进一步减少了重建节点的数据重建负担。为了量化这种效率提升，文中构建了两个马尔可夫模型来评估RAM-RS方案在可靠性方面的性能。通过优化I/O重定向策略，RAM-RS能够在不影响用户体验的同时，提高擦除编码存储集群的整体效率和恢复速度。它有效地平衡了用户服务和集群的重建需求，对于大型分布式存储系统来说，具有重要的实际应用价值。这篇研究论文不仅提供了一个创新的解决方案，也为后续在类似场景下优化I/O管理提供了理论基础和技术参考。

Two I/O ﬂows observed during on-line reconstruction

are (1) the reconstruction read ﬂow and (2) the user R/W

request ﬂow. In the reconstruction ﬂow, node RN

sends

read requests to k surviving nodes to retrieve surviving

blocks; in the user request ﬂow, read/write requests are

issued to data nodes by users. We refer to requests as

‘Normal Read/Writes’ if requested data blocks are residing

on surviving data nodes. If requested data blocks are stored

on a failed data node (e.g., DN

in Fig. 2), then we refer to

these requests as ‘Missed Read/Writes’.

2.3 Conventional R/W Procedures

Let us consider the conventional I/O procedures under on-

line reconstruction (see Fig. 2), where a rebuilding node

continuously retrieves k surviving blocks; meanwhile, cli-

ents issue read/write requests to data nodes. During the

reconstruction, a storage cluster must handle the following

two types of user I/Os:

1. Normal Read/Writes. A normal read request is directly

serviced by the storage cluster. When it comes to a

normal write, the requested data blocks and associ-

ated r parity blocks are overwritten. New parity

blocks are generated using RMW or RCW.

2. Missed Read/Writes. A popular handling procedure

called Redirection redirects all missed user read/writes

to a rebuilding node (or a standby disk in RAID [17]).

The Redirection scheme is a baseline solution used to

evaluate our proposed RAM-RS scheme.

A missed read may be served in two ways if the

failed node has been partially reconstructed on the

rebuilding node: (a) The missed read may be

responded to by reconstructing the block from k sur-

viving blocks; (b) The missed read may be serviced by

reading from the rebuilding node provided that the

block has already been reconstructed.

There are two approaches to processing a missed

write: (a) If the data block has not yet been recovered,

then the RCW scheme is applied to generate r new par-

ity blocks, which are written to parity nodes. RCW also

directly writes the new data block to the rebuilding

node. The RMW method is not applicable in this case

because the requested data blocks have been failed; (b)

If the data block has been reconstructed, the user write

can be redirected to the rebuilding node; both the

RMW and RCW methods can be employed to generate

the new parity blocks.

A write operation in erasure-coded storage clusters is a

composite one. Take RMW as an example, three steps (i.e.,

reading, calculating, and writing) are involved in updating

parity blocks. After reading r parity blocks, RMW calculates

new parity blocks. Then, RMW writes the new parity blocks

to parity nodes.

3THE RAM-RS SCHEME

3.1 The I/O Interference Problem

Following the Redirection procedure, a rebuilding node

should concurrently serve both reconstruction and missed

user I/Os. In this case, reconstruction and user I/Os may

compete for available network and disk bandwidth. To

evaluate interference between reconstruction and user I/Os,

we carry out a set of experiments, where the Web-2 trace

[18] is replayed on a (9,6) RS-coded storage cluster. The

hardware conﬁguration is listed in Section 5.1. It is worth

noting that (9,6) RS codes are adopted by real-world sys-

tems like GFS II [19] and QFS [20], which are used to sup-

port web search and data analysis, respectively.

Fig. 3 shows the reconstruction time and user response

time of three reconstruction options, including off-line

reconstruction, single-node on-line reconstruction, and

the degraded mode. Under off-line reconstruction, a stor-

age cluster devotes all of its resources to performing

reconstruction without serving any user request until the

failed node is recovered. In the degraded mode, surviving

nodes only service user I/Os without serving reconstruc-

tion requests. We observe that, ( 1) on-line reconstruction

duration grows by a factor of 1.55 compared to that of

off-line reconstruction, because user requests and recon-

struction requests compete for bandwidth resource dur-

ing on-line reconstruction; (2) the user response t ime

increases by a factor of 1.60 during on-line reconstruction

compared to that in the d egraded m ode, because part of

the bandwidth resource is consumed by reconstruction

requests under on-line reconstruction. In a word, the per-

formance problem experienced in the on-line reconstruc-

tion scheme is attributed to bandwidth competition that

leads to both increased reconstruction time and user

response time.

To address such an I/O Interference problem, we adopt an

I/O redirection scheme called ‘RAM-RS’ to redirect user

accesses to failed data blocks to an RS-coded RAM region,

aiming at isolating reconstruction reads from missed user

I/Os to minimize the I/O interference.

3.2 The Idea of RAM-RS

As mentioned in Section 2.3, read/write misses are served

at the cost of network bandwidth in the rebuilding node,

thereby degrading reconstruction performance. On the

other hand, the rebuilding node manages a long I/O

queue fo r both use r and r econs truct ion requests, leading

to large user response time. To minimize I/O i nterference

occurred on the rebuilding node, RAM-RS redirects

missed read/writes to an RS-coded RAM region—a dura-

ble and reli able DRAM-based space formed by pre-allo-

catedmainmemoryonsurvivingnodesinanRS-coding

manner. With the RS-coded RAM region in place, the

rebuilding node can devote its bandwidth resource s to

performing reconstruction.

Fig. 3. Interference between reconstruction and user I/Os. Off-line

reconstruction has better performance than on-line reconstruction; the

degraded mode offers smaller user response time than on-line

reconstruction.

HUANG ET AL.: AN EFFICIENT I/O-REDIRECTION-BASED RECONSTRUCTION SCHEME FOR ERASURE-CODED STORAGE CLUSTERS 3039

剩余13页未读，继续阅读

weixin_38743481

粉丝: 696
资源: 4万+

基于I/O重定向的高效Erasure-Coded存储集群重构策略

An Efficient I/O-Redirection-Based Reconstruction Scheme for Erasure-Coded Storage Clusters

[免费放送]单片机I/O口E2PROM读写程序.

分组与流水线：擦除编码存储系统中的高效就地更新策略

为基于副本的存储集群优化擦除编码数据归档

用于擦除编码存储系统的基于堆栈的单磁盘故障恢复方案

具有关联意识的条带化组织，用于在擦除编码存储系统中进行高效写入

基于蚁群优化的擦除编码存储系统数据更新方案

重新访问擦除编码的内存存储中的更新方案

提升基于副本存储集群的擦除编码数据归档效率

嵌入式系统存储器与I/O接口原理

最新资源