Z Codes：分布式存储系统的通用最小存储修复带宽纠错码

88 浏览量更新于2024-08-30 收藏 1.34MB PDF 举报

"Z Codes: 一种通用的系统性纠错码，具有分布式存储系统下的最优修复带宽和最小存储" Z Codes是一种系统性的纠错码，专为分布式存储系统设计，旨在解决传统纠错码在数据恢复时修复带宽过高问题。在分布式存储系统中，数据冗余是防止数据丢失的关键策略。然而，当某个数据块丢失需要恢复时，传统纠错码通常需要传输的数据量远超理论上的最小值，即修复带宽过大。近年来，虽然已经提出许多新的纠错码来降低修复带宽，但这些编码方案要么需要额外的存储容量和计算开销，要么只能应用于特定情况。Z Codes的出现，旨在克服现有解决方案的不足，提出一个通用的编码家族，能够在满足最小存储需求的同时实现理论上的最优修复带宽。 Z Codes的核心在于其优化的修复机制。它们允许在不增加过多存储负担的情况下，高效地修复丢失的数据块。这一特性对于大规模的分布式存储系统尤其重要，因为这些系统往往需要处理频繁的数据节点故障，并且对带宽效率和存储效率有严格的要求。在Z Codes的设计中，编码过程考虑了数据的局部性和全局性，以确保在数据丢失时仅需最少的数据传输就能重建丢失的信息。同时，Z Codes还兼顾了编码和解码的复杂性，以适应实际应用环境中的计算资源限制。 Z Codes的构造基于编码理论的最新进展，可能包括但不限于线性码、卷积码、低密度奇偶校验码（LDPC）或高密度奇偶校验码（HDPC）等技术的扩展和改进。通过巧妙的编码结构和算法设计，Z Codes能够实现对单个或多个数据块丢失的有效修复，而修复过程中的带宽使用达到最优，从而显著降低系统的整体运行成本。此外，Z Codes的通用性意味着它们可以应用于各种分布式存储架构，无论系统规模大小，都能提供理想的性能。这为实际部署提供了极大的灵活性和可扩展性。总结而言，Z Codes是一种创新的纠错码方案，它解决了分布式存储系统中修复带宽和存储效率之间的矛盾，为大数据时代的可靠存储提供了新的技术途径。通过优化修复带宽和最小化存储需求，Z Codes有望成为未来分布式存储系统设计的重要参考。

Z codes: General Systematic Erasure Codes with Optimal Repair Bandwidth and Storage for

Distributed Storage Systems

Qing Liu

∗

, Dan Feng

∗

, Hong Jiang

†

, Yuchong Hu

∗

, Tianfeng Jiao

∗

Wuhan National Laboratory for Optoelectronics (WNLO),

∗

School of Computer, Huazhong University of Science and Technology (HUST), China

†

University of Nebraska-Lincoln, USA

Email: {qing, dfeng}@hust.edu.cn, jiang@cse.unl.edu, {yuchonghu, tfjiao}@hust.edu.cn

Abstract—Erasure codes are widely used in distributed storage

systems to prevent data loss. Traditional erasure codes suffer

from a typical repair-bandwidth problem in which the amount

of data required to reconstruct the lost data, referred to as

the repair bandwidth, is often far more than the theoretical

minimum. While many novel erasure codes have been proposed

in recent years to reduce the repair bandwidth, these codes either

require extra storage capacity and computation overhead or are

only applicable to some special cases.

To address the weaknesses of the existing solutions to the

repair-bandwidth problem, we propose Z Codes, a general family

of codes capable of achieving the theoretical lower bound of

repair bandwidth for a single data node failure. To the best

of our knowledge, the Z codes are the ﬁrst general systematic

erasure codes that achieve optimal repair bandwidth under the

minimum storage. Our in-memory performance evaluations of

a 1-GB ﬁle indicate that Z codes have encoding and repairing

speeds that are approximately equal to those of the Reed-Solomon

(RS) codes, and their speed on the order of GB/s practically

removes computation as a performance bottleneck.

Index Terms—Erasure Codes; Repair Bandwidth; Distributed

Storage System; Failure Tolerance

I. INTRODUCTION

Erasure codes are widely used in distributed storage systems

to recover from data loss in the event of server breakdown.

These codes incorporate data redundancy in a space-efﬁcient

manner to tolerate data loss by reconstructing the lost data and

are systematic in that the original data is kept unchanged after

encoding and can be accessed without decoding. Typical sys-

tematic codes include Reed-Solomon (RS) codes and Cauchy

Reed-Solomon (CRS) codes.

However, such traditional erasure codes face a known

repair-bandwidth problem [1] that becomes increasingly more

important in a distributed environment where bandwidth is

typically expensive in terms of both performance and power

consumption. That is, in a storage system of data size M with

k data nodes and m parity (i.e., redundant) nodes that are

interconnected by a network of limited bandwidth, each node

stores data of size

and the repair of one node’s failure

requires a disk-I/O or network bandwidth of size M, which is

k times the size of the lost data (

). In this paper, we deﬁne

repair bandwidth as the amount of the data accessed by the

disk I/O and transferred over the network.

The minimum storage for an (m, k) code is

, so k

nodes of data can retain the original data. However, Dimakis

et al. pointed out that the theoretical minimum storage and

Storage overhead

Repair bandwidth

Z codes

RS codes

Minimum

storage

Minimum repair

bandwidth

Optimal

repair

Fig. 1: Theoretical lower-bound trade-off curve of storage

overhead and repair bandwidth.

minimum repair bandwidth cannot be achieved at the same

time and there is a lower-bound trade-off curve between the

two [1], as plotted in Fig. 1. Although codes with the minimum

storage cannot achieve the minimum repair bandwidth, their

theoretical repair bandwidth lower bound, which is called

optimal repair bandwidth [2], can be calculated as:

(m + k − 1)M/(mk) (1)

The repair bandwidth mentioned above refers particularly to a

single node failure, which is the most common case in practice.

Recently, many novel repair-bandwidth-efﬁcient codes have

been proposed to reduce the repair bandwidth, but at the ex-

penses of (1) extra storage capacity, (2) additional computation

overhead or (3) being applicable only to some special cases.

The Simple Regenerating Codes (SRC) [3] and Local Recon-

struction Codes (LRC) [4] need additional storage resources to

store the extra parity information. The Functional Minimum

Storage Regenerating (FMSR) codes are not systematic and

only store parity information after encoding, thereby resulting

in a high computation cost [5]. The Rotate Reed-Solomon

(RRS) codes [6] also require additional computation for re-

pairing the failure of a single data node. Under the burden

of not having a general construction mechanism, the Zigzag

codes [7] are unsuitable for general storage systems. The

Product-matrix-MSR (PMSR) codes [2] are only applicable

when the code rate (the ratio of the data size and size of data

after encoding) is less than

, namely, m > k, which greatly

limits their applicability.

To address the above weaknesses in the existing codes, we

present in this paper a family of novel erasure codes, called

the Z codes. The Z codes not only can achieve the theoretical

optimal repair bandwidth under the minimum storage for a

single data node’s failure, but also have the following desirable

properties that make them suitable for distributed storage

systems. (1) The minimum storage property: the Z codes

consume exactly the same storage capacity as the RS and

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38520046

粉丝: 8
资源: 932

Z Codes：分布式存储系统的通用最小存储修复带宽纠错码

擦除代码算法演示：深入理解JS erasure-codes库

Turbo Codes：逼近香农限的高效纠错编码

DB2 SQL Error Codes: Diagnosis and Solutions

General Functional Regenerating Codes with Uncoded Repair forDistributed Storage System

jiahao.codes：:smiling_face_with_sunglasses:个人超赞网站

arc.codes：建筑师网站！ :cloud_with_lightning:

intro-erasure-codes:该存储库包含与擦除代码相关的算法演示

zipcodes:Zipcodes API服务器

codes:zhaojizhuang's codes repository

nvd.codes：:notebook:个人博客，随机言论和著作

最新资源