http://www.paper.edu.cn
- 1 -
基于流水线并行化的纠删码恢复优化研究
徐慧,杨天枭
*
(中国矿业大学(北京)机电与信息工程学院,北京 100083)
5
作者简介:徐慧,女,副教授,中国矿业大学副教授,硕士生导师,毕业于中国矿业大学(北京),获学
硕士学位,并任计算机科学与技术系副主任 研究方向:数据库与数据挖掘工. E-mail: xuh@cumtb.edu.cn
摘要:布式存储系统构建于大量的廉价节点之上,使得节点失效成为一种常态。为了保证数据的可靠性,
系统必须具备数据容错方案。纠删码冗余方案可以在提供更低的存储开销的同时,获得和副本冗余方案相
同的可靠性。但是,在实际运用中,基于纠删码的存储系统在恢复数据时,恢复节点需要从多个存活节点
读取数据到本地,然后通过解码算法恢复出数据。这不仅对恢复节点造成了较大压力,而且会占据大量的
网络带宽,影响系统整体性能。由此,本文提出了一种基于纠删码的存储系统数据恢复优化方法。首先,10
通过对纠删码恢复算法的分析,证明了纠删码的恢复操作是可以并行的,随后,设计了一种基于流水线的
并行化数据恢复方案,最后,通过分析现实中的网络拓扑结构,设计了一种可以最小化恢复过程中数据传
输总长度的算法,提高网络中高层数据链路利用率。实验表明,相比目前存在的星型恢复方式,本文所提
出的的流水线式并行恢复方法可以显著降低数据恢复延时,提高恢复效率。
关键词:分布式存储;纠删码;流水线并行化;网络拓扑 15
中图分类号:TP391.4
Research on Recovery Optimization of Erasure Code Based
on Parallel Parallelization of Pipeline
XU Hui, YANG Tianxiao 20
(School of Mechatronics and Information Engineering,China University of Mining and
Technology (Beijing) 100083)
Abstract: The distributed storage system is built on a large number of cheap nodes, making the node
failure become a normal state.In order to ensure the reliability of the data, the system must have data
fault tolerance scheme.Erasure Code schemes can achieve the same reliability as replica redundancy 25
while providing lower storage overhead.However, in practical use,the storage system based on the
erasure code, the recovery node needs to read data from multiple surviving nodes to the local disk, then
recover the data through the decoding algorithm.This not only puts a lot of pressure on the recovery
node, but also occupies a lot of network bandwidth, affecting the overall system performance.Thus,this
paper presents a method of data recovery and optimization of storage system based on erasure 30
code.First, through the analysis of the algorithm to erasure code,it is proved that the recovery operation
of the erasure code can be parallel,then,a parallel data recovery scheme based on pipeline is
designed,finally,by analyzing the reality of the network topology,An algorithm to minimize the total
length of data transmission in the recovery process is designed to improve the utilization of high-level
data links in the network.Experiments show that compared with the existing star recovery method, the 35
proposed pipeline parallel recovery method can significantly reduce the data recovery delay and
improve the recovery efficiency.
Key words: Distributed storage;Erasure code;Pipeline parallelization;Network topology
40
0 引言
云存储可以提供方便、灵活的使用方式,让用户像操作本地文件一样,进行上传/下载、
检索、管理用于 Web 网站或者移动应用的海量数据。并可以提供强大的安全保障,可以通
过一定的容错技术,确保用户数据不丢失,提供多地区多机房的布置,实现数据异地容灾
[1]
。
用户无需采购机器,可以节省部署和运维成本,只需将精力专心于业务的实现,亦可随时扩45
容
[2]
。