动态数据网格复制策略综述：降低延迟与增强可用性

需积分: 1 91 浏览量更新于2024-09-12 收藏 553KB PDF 举报

动态副本放置策略在云计算调度中的重要性随着云计算的兴起，数据网格（Datagrid）作为分布式计算的基础架构，其效率和性能对服务质量和用户满意度至关重要。本文，由R.Kingsy Gracea和R.Manimegalaib两位作者撰写，发表于《并行分布式计算》(J.ParallelDistrib.Comput.)第74卷（2014年），探讨了动态副本放置和选择策略在数据网格中的广泛应用。文章的亮点在于，它提供了一次全面的调查，涵盖了以下关键主题： 1. **概述**：文章首先对数据网格中副本选择和放置策略进行了系统性的回顾，这些策略旨在降低数据访问延迟、存储消耗、网络带宽需求以及整体计算任务的完成时间（即Makespan）。 2. **性能评价参数**：作者总结了用于评估网格性能的关键参数，如数据一致性、冗余度、复制成本、故障恢复时间和响应时间等，这些参数是衡量策略有效性的核心指标。 3. **网格架构模型**：文中讨论了不同的数据网格架构模型，如P2P网格、基于服务的网格、和云环境下的网格，这些模型对于理解不同环境下动态副本放置策略的重要性至关重要。 4. **模拟工具**：研究还涉及到了使用的仿真工具，这些工具帮助研究人员在设计和评估策略时进行模型化，以便预测和优化实际环境中的性能。 5. **动态复制**：动态副本放置策略的核心在于其灵活性，可以根据工作负载、节点可用性和网络条件实时调整复制位置，以适应不断变化的环境需求。 6. **关键词**：文章的关键词包括“复制选择”、“复制放置”、“动态复制”和“数据网格”、“计算网格”，突出了本文研究的重点。 7. **研究历程**：从2013年5月接收修订稿到同年11月发布，这表明了研究的严谨性和同行评审过程。这篇文章不仅提供了对数据网格中动态副本策略的深入理解，还为研究人员和开发者提供了一个实用的框架，指导他们在设计高效、可扩展的云调度算法时考虑各种因素和挑战。对于任何关注云计算和大数据管理的读者来说，这是理解和优化数据中心资源分配不可或缺的一篇参考文献。

J. Parallel Distrib. Comput. 74 (2014) 2099–2108

Contents lists available at ScienceDirect

J. Parallel Distrib. Comput.

journal homepage: www.elsevier.com/locate/jpdc

Dynamic replica placement and selection strategies in data grids—

A comprehensive survey

R. Kingsy Grace

a,∗

, R. Manimegalai

Computer Science and Engineering, Sri Ramakrishna Engineering College, India

Computer Science and Engineering, Park College of Engineering and Technology, India

h i g h l i g h t s

• Survey on replica placement and selection strategies in data grids is presented.

• Parameters that are used to evaluate the grid performance are summarized.

• Grid architectural models and simulation tools used are discussed.

a r t i c l e i n f o

Article history:

Received 15 May 2013

Received in revised form

25 October 2013

Accepted 31 October 2013

Available online 7 November 2013

Keywords:

Replica selection

Replica placement

Dynamic replication

Data grid, Computational grid

a b s t r a c t

Data replication techniques are used in data grid to reduce makespan, storage consumption, access

latency and network bandwidth. Data replication enhances data availability and thereby increases the

system reliability. There are two steps involved in data replication, namely, replica placement and

replica selection. Replica placement involves identifying the best possible node to duplicate data based

on network latency and user request. Replica selection involves selecting the best replica location to

access the data for job execution in the data grid. Various replica placement and selection algorithms are

available in the literature. These algorithms measure and analyze different parameters such as bandwidth

consumption, access cost, scalability, execution time, storage consumption and makespan. In this paper,

various replica placement and selection strategies along with their merits and demerits are discussed.

This paper also analyses the performance of various strategies with respect to the parameters mentioned

above. In particular, this paper focuses on the dynamic replica placement and selection strategies in the

data grid environment.

1. Introduction: replica placement and replica selection

A computational grid [22] is a combination of both hardware

and software that provides reliable and consistent resources to ex-

ecute a job in distributed environment. Data grid is a distributed

collection of storage and computational resources located in dif-

ferent geographical locations. [1,18,23] describe grid is a flexi-

ble, secure and co-ordinated resource sharing environment for

individuals, institutions and resources. Computationally inten-

sive applications need large amount of data, but maintaining a

local copy of data at every node is very expensive and not practical.

In general, huge volume of data (tera-bytes or peta-bytes) is stored

and managed in data grids. Scientific applications such as high

energy physics [31], data mining, climate simulation and satel-

lite image processing applications produce large amounts of data

∗

Corresponding author.

E-mail addresses: kingsydhas@gmail.com (R. Kingsy Grace),

mmegalai@yahoo.com (R. Manimegalai).

[1–3,19]. Managing and accessing such large amount of data stored

in geographically different locations is slow and tedious due to net-

work constraints. With the growing size of data grid architecture, it

is necessary to increase the availability of data in the grid by using

data replication techniques. Data replication strategies are used to

increase the data availability [7] for execution of jobs in the grid.

They also provide increased fault tolerance, improved scalability,

and reduced response time and bandwidth consumption. Amjad

et al. [7] have presented various dynamic replication strategies in

data grids where as this paper focuses on various replica placement

and selection strategies in data grids.

Two important challenges in data replication techniques are:

(i) replica placement and (ii) replica selection. Replica placement

is the problem of placing duplicate copies of data in the most ap-

propriate node in the data grid. The replica placement or replica-

tion can be logically divided into three stages, namely, replication

decision, replica selection and file replacement [37]. The replica-

tion decision stage decides when and where to create the replica.

If the decision is not to replicate, the file will be read remotely.

The second stage, replica selection, decides which file needs to be

http://dx.doi.org/10.1016/j.jpdc.2013.10.009

下载后可阅读完整内容，剩余9页未读，立即下载

DavidKok

粉丝: 1

动态数据网格复制策略综述：降低延迟与增强可用性

Dynamic Replica dataset

Qt Remote Object（QtRO）动态Replica实现进程间通信Demo

A dynamic replica allocation method based on database migration in broadband networks

A transient-improved dynamic-replica LDO regulator with bulk modulation

Dynamic-replica-based all-condition-stable LDO regulator with 5X improved load regulation

Replica

Replicaisland

replicaisland

Uber-Replica

Onebanc_Replica

最新资源