Two I/O flows observed during on-line reconstruction
are (1) the reconstruction read flow and (2) the user R/W
request flow. In the reconstruction flow, node RN
1
sends
read requests to k surviving nodes to retrieve surviving
blocks; in the user request flow, read/write requests are
issued to data nodes by users. We refer to requests as
‘Normal Read/Writes’ if requested data blocks are residing
on surviving data nodes. If requested data blocks are stored
on a failed data node (e.g., DN
1
in Fig. 2), then we refer to
these requests as ‘Missed Read/Writes’.
2.3 Conventional R/W Procedures
Let us consider the conventional I/O procedures under on-
line reconstruction (see Fig. 2), where a rebuilding node
continuously retrieves k surviving blocks; meanwhile, cli-
ents issue read/write requests to data nodes. During the
reconstruction, a storage cluster must handle the following
two types of user I/Os:
1. Normal Read/Writes. A normal read request is directly
serviced by the storage cluster. When it comes to a
normal write, the requested data blocks and associ-
ated r parity blocks are overwritten. New parity
blocks are generated using RMW or RCW.
2. Missed Read/Writes. A popular handling procedure
called Redirection redirects all missed user read/writes
to a rebuilding node (or a standby disk in RAID [17]).
The Redirection scheme is a baseline solution used to
evaluate our proposed RAM-RS scheme.
A missed read may be served in two ways if the
failed node has been partially reconstructed on the
rebuilding node: (a) The missed read may be
responded to by reconstructing the block from k sur-
viving blocks; (b) The missed read may be serviced by
reading from the rebuilding node provided that the
block has already been reconstructed.
There are two approaches to processing a missed
write: (a) If the data block has not yet been recovered,
then the RCW scheme is applied to generate r new par-
ity blocks, which are written to parity nodes. RCW also
directly writes the new data block to the rebuilding
node. The RMW method is not applicable in this case
because the requested data blocks have been failed; (b)
If the data block has been reconstructed, the user write
can be redirected to the rebuilding node; both the
RMW and RCW methods can be employed to generate
the new parity blocks.
A write operation in erasure-coded storage clusters is a
composite one. Take RMW as an example, three steps (i.e.,
reading, calculating, and writing) are involved in updating
parity blocks. After reading r parity blocks, RMW calculates
new parity blocks. Then, RMW writes the new parity blocks
to parity nodes.
3THE RAM-RS SCHEME
3.1 The I/O Interference Problem
Following the Redirection procedure, a rebuilding node
should concurrently serve both reconstruction and missed
user I/Os. In this case, reconstruction and user I/Os may
compete for available network and disk bandwidth. To
evaluate interference between reconstruction and user I/Os,
we carry out a set of experiments, where the Web-2 trace
[18] is replayed on a (9,6) RS-coded storage cluster. The
hardware configuration is listed in Section 5.1. It is worth
noting that (9,6) RS codes are adopted by real-world sys-
tems like GFS II [19] and QFS [20], which are used to sup-
port web search and data analysis, respectively.
Fig. 3 shows the reconstruction time and user response
time of three reconstruction options, including off-line
reconstruction, single-node on-line reconstruction, and
the degraded mode. Under off-line reconstruction, a stor-
age cluster devotes all of its resources to performing
reconstruction without serving any user request until the
failed node is recovered. In the degraded mode, surviving
nodes only service user I/Os without serving reconstruc-
tion requests. We observe that, ( 1) on-line reconstruction
duration grows by a factor of 1.55 compared to that of
off-line reconstruction, because user requests and recon-
struction requests compete for bandwidth resource dur-
ing on-line reconstruction; (2) the user response t ime
increases by a factor of 1.60 during on-line reconstruction
compared to that in the d egraded m ode, because part of
the bandwidth resource is consumed by reconstruction
requests under on-line reconstruction. In a word, the per-
formance problem experienced in the on-line reconstruc-
tion scheme is attributed to bandwidth competition that
leads to both increased reconstruction time and user
response time.
To address such an I/O Interference problem, we adopt an
I/O redirection scheme called ‘RAM-RS’ to redirect user
accesses to failed data blocks to an RS-coded RAM region,
aiming at isolating reconstruction reads from missed user
I/Os to minimize the I/O interference.
3.2 The Idea of RAM-RS
As mentioned in Section 2.3, read/write misses are served
at the cost of network bandwidth in the rebuilding node,
thereby degrading reconstruction performance. On the
other hand, the rebuilding node manages a long I/O
queue fo r both use r and r econs truct ion requests, leading
to large user response time. To minimize I/O i nterference
occurred on the rebuilding node, RAM-RS redirects
missed read/writes to an RS-coded RAM region—a dura-
ble and reli able DRAM-based space formed by pre-allo-
catedmainmemoryonsurvivingnodesinanRS-coding
manner. With the RS-coded RAM region in place, the
rebuilding node can devote its bandwidth resource s to
performing reconstruction.
Fig. 3. Interference between reconstruction and user I/Os. Off-line
reconstruction has better performance than on-line reconstruction; the
degraded mode offers smaller user response time than on-line
reconstruction.
HUANG ET AL.: AN EFFICIENT I/O-REDIRECTION-BASED RECONSTRUCTION SCHEME FOR ERASURE-CODED STORAGE CLUSTERS 3039