分组与流水线：擦除编码存储系统中的高效就地更新策略

研究论文

193 浏览量更新于2024-07-15 收藏 1.95MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

本文主要探讨了在擦除编码存储系统（Erasure-Coded Storage Systems, ECSS）中实现高效就地更新（In-place Update）的方法。擦除编码是一种冗余数据存储技术，用于提高存储系统的可靠性，当某个数据块损坏时，可以从其他备份块恢复。然而，传统的ECSS在处理大量数据更新时，可能会遇到性能瓶颈，尤其是单个或多个数据块的更新操作。为了克服这些挑战，研究者提出了一个三层框架，旨在支持单个和多个数据块的更新操作。首先，他们关注到工作负载平衡问题，提出了一种工作负载感知的分组技术（Workload-Aware Grouping）。该技术动态调整分组大小，以适应不同数据访问模式下的性能需求，确保更新过程中的效率。其次，他们设计了一种分布式管道技术（Distributed Pipeline），将计算任务分解到多个节点上，实现了并行处理，进一步提高了数据处理速度。这种方法有效地减少了单个节点的负载，并且在处理大规模数据集时展现出良好的扩展性。考虑到节点故障的容错性，研究者还引入了混合更新技术（Hybrid Update Technique）。这种技术结合了延迟更新策略（Lazy-update），即只有在必要时才执行实际更新，从而在保持数据一致性的同时降低了对系统实时性的要求。为了验证其方法的有效性和优势，作者进行了广泛的实验，对比了他们的方案与传统方法在更新速度、系统吞吐量和资源利用率等关键指标上的表现。实验结果表明，通过分组和流水线式数据传输，他们的方法显著提高了在擦除编码存储系统中的就地更新效率，特别是在面对高负载和节点故障的情况下。这篇研究论文深入探讨了在擦除编码存储系统中优化数据更新性能的关键技术，包括工作负载感知的分组策略、分布式计算的运用以及容错机制的设计。这对于提升大规模数据存储系统的实时性和可靠性具有重要的理论和实践意义。

资源详情

资源推荐

X. Pei et al. / Future Generation Computer Systems 69 (2017) 24–40 27

Table 1

Parameter notations.

Symbols Representations Range

(n, k, r) Coding parameters 1 ≤ r ≤ n − k

m Number of modified data blocks 1 ≤ m ≤ k

g Number of groups g ∈ [1, k]

The ith original data block i ∈ [0, k − 1]

′

The ith updated data block i ∈ [0, k − 1]

The ith original parity block i ∈ [0, r − 1]

∗

i,j

The ith partial parity block from the jth group i ∈ [0, r − 1], j ∈ [1, g]

′

The ith updated parity block i ∈ [0, r − 1]

i,j,l

Coefficient of lth data block, jth parity block, ith group i ∈ [1, g], j ∈ [0, r − 1]

reconstruction schemes proposed in [26] exploit the data locality

of multiple data blocks and organize the data transmission

along the reconstruction path in a pipelined way to reduce

the reconstruction time. However, the reconstruction schemes

is sensitive to slow nodes, where the reconstruction time is

limited by the node with the slowest performance along the

‘‘line’’ structure. Jun Li etc. [27] propose RCTREE to minimize the

reconstruction traffic by combining the advantage of regenerating

codes with a tree-structured regeneration topology. However,

the tree is constructed according to the available bandwidth,

which is hard to detect in the real distributed system. For

dealing with the situation of multiple failures, researchers [28,

29] propose reconstruction schemes for multiple failures based on

the tree structure, which improve the reconstruction performance

for multiple failures by parallel or cooperative reconstructions.

However, the trees constructed are independent with each other,

which may consume more network traffic. Moreover, researchers

in [30,31] propose the cooperative reconstruction schemes to

reduce the network traffic for multiple failures by cooperatively

exchanging data between new nodes. However, the nodes are

organized with star structure, which shows degraded transmission

efficiency compared to tree structure.

In this paper, we focus on the optimization of update efficiency

for erasure codes and propose a grouped and pipelined update

scheme Group-U based on erasure codes. Group-U groups the

multiple data nodes and elects one data node as the relayer

to organize the data flow from the data nodes to the parity

nodes. Furthermore, Group-U distributes the computation among

multiple nodes to improve the update efficiency. With the

comprehensive threshold and data cache technique, Group-U

ensures the data reliability and reconstructs the failed data with

the least overhead.

4. Grouped and pipelined data transmission

4.1. Architecture of framework

In this section, we propose a general three-layer update

framework for Group-U supporting both single and multiple

updates, which is illustrated in Fig. 1. The parameters used in this

section are showed in Table 1.

At a high level, the framework consists of three layers: the

data layer, the relay layer and the parity layer. The data layer

is responsible for grouping the data nodes and transmitting

the delta data of each data block to the relay layer. There are

three steps at this layer. Firstly, the m data nodes (denoted

as (node

, . . . , node

), with data blocks of (D

, . . . , D

m−1

)) are

divided into g groups, with m/g data blocks in each group, where

g is the group number. Thus, we get g grouped data blocks

, . . . , D

m/g−1

), . . . , (D

(m(g−1))/g

, . . . , D

m−1

). In each group, we

elect one of the data nodes to be the relayer, which is responsible

for organizing the data flow from other data nodes within the

same group to the parity nodes. In this way, the data transmission

and the data computation are restricted within each group,

which improves the update efficiency. Then, each data node node

calculates the delta data, which is represented by D

′

−D

. Each data

node D

sends the delta to the relayer within each group. Finally,

each node

completes the update by replacing the original data D

with D

′

At the relay layer, there are g relayers (node

, node

m/g

, . . . ,

node

m(g−1)/g

), with node

(m(j−1))/g

as the relayer in the jth group,

where 1 ≤ j ≤ g. There are two steps for each relayer at

this layer. Firstly, each relayer node

calculates the delta D

′

− D

and receives m/g − 1 deltas of the m/g − 1 data blocks from

m/g − 1 data nodes within a group. Thus, there are m/g deltas

((D

′

− D

), . . . , (D

′

i+(m/g)−1

− D

i+(m/g)−1

)) for each relayer node

Then, the relayer node

encodes the m/g − 1 received deltas and

its delta into r partial parity parts P

∗

i,j

with Eq. (5), where P

∗

i,j

is the

jth partial parity part of the ith group. Finally, these encoded partial

parity parts are sent to the parity layer, with P

∗

i,j

for parity

from the

relayer node

∗

i,j

(m∗i)/g−1



l=(m(i−1))/g

i,j,l

∗ (D

′

− D

1 ≤ i ≤ g, 0 ≤ j ≤ r − 1, 1 ≤ m ≤ k. (5)

At the parity layer, there are r parity nodes (denoted as

(parity

, . . . , parity

r−1

)) with r parity blocks (P

, . . . , P

r−1

) and

the r updated parity blocks are represented as (P

′

, . . . , P

′

r−1

). Each

parity node completes the update with two steps. Firstly, each

parity node parity

, 0 ≤ i ≤ r − 1 receives g partial parity parts

∗

1,i

, . . . , P

∗

g,i

), with one part P

∗

j,i

from the jth group. Then, each

parity node parity

encodes the g received partial parity parts with

the stored parity data according to Eq. (6), where P

∗

j,i

is the partial

parity part received from the jth group, P

is the original parity

block, and P

′

is the updated parity block.

′



j=1

∗

j,i

+ P

, 0 ≤ i ≤ r − 1. (6)

The nodes in all the three layers cooperatively complete the

update process. When there are multiple data nodes to be updated

(m > 1), they are separated into g groups and the nodes in each

group complete the update operations independently. When there

is only one data node to be updated (i.e. m = 1), the data node will

send the delta to all the parity nodes after updating the stored data.

For both cases, the parity nodes update the parity data with Eq.

(6). In the following section, we specify how to improve the update

efficiency by dynamically adjusting the group size, distributing the

data computation and pipelining the data transmission.

4.2. Grouping the data nodes

In this section, we are concerned about why we should group

the data nodes to be updated and how to group the data nodes.

When there are m data nodes to be updated, it may cause

bottleneck if all the data nodes connect to one relayer to complete

剩余16页未读，继续阅读

weixin_38713009

粉丝: 8
资源: 919

分组与流水线：擦除编码存储系统中的高效就地更新策略

处理数据清洗过程中数据的更新

基于蚁群优化的擦除编码存储系统数据更新方案

fmc 和flash擦除

stm32通过spi写入w25q128时擦除时间太长占用cpu时间

stm32通过spi写入w25q128时擦除时间太长

argis如何擦除栅格数据

pz-isp按文件擦除和全部擦除什么意思

怎么擦除openmv存储

s32k flash擦除成功

tc397 ota升级擦除flash之后写入数据的过程

mtd子系统擦除分区

Bootloader中 Drive下载地址、内存擦除地址、以及app传输地址有什么联系和区别

android 擦除应用数据

简述autosar can通讯数据流或者ee存储数据流

emmc5.1各种擦除区别

win10 ssd 擦除

bootloader扇区擦除

擦除/覆盖是数据删除的常规方法

hy27uf081g2a 数据手册

擦除flash是写入1吗

最新资源