J. Parallel Distrib. Comput. 74 (2014) 2099–2108
Contents lists available at ScienceDirect
J. Parallel Distrib. Comput.
journal homepage: www.elsevier.com/locate/jpdc
Dynamic replica placement and selection strategies in data grids—
A comprehensive survey
R. Kingsy Grace
a,∗
, R. Manimegalai
b
a
Computer Science and Engineering, Sri Ramakrishna Engineering College, India
b
Computer Science and Engineering, Park College of Engineering and Technology, India
h i g h l i g h t s
• Survey on replica placement and selection strategies in data grids is presented.
• Parameters that are used to evaluate the grid performance are summarized.
• Grid architectural models and simulation tools used are discussed.
a r t i c l e i n f o
Article history:
Received 15 May 2013
Received in revised form
25 October 2013
Accepted 31 October 2013
Available online 7 November 2013
Keywords:
Replica selection
Replica placement
Dynamic replication
Data grid, Computational grid
a b s t r a c t
Data replication techniques are used in data grid to reduce makespan, storage consumption, access
latency and network bandwidth. Data replication enhances data availability and thereby increases the
system reliability. There are two steps involved in data replication, namely, replica placement and
replica selection. Replica placement involves identifying the best possible node to duplicate data based
on network latency and user request. Replica selection involves selecting the best replica location to
access the data for job execution in the data grid. Various replica placement and selection algorithms are
available in the literature. These algorithms measure and analyze different parameters such as bandwidth
consumption, access cost, scalability, execution time, storage consumption and makespan. In this paper,
various replica placement and selection strategies along with their merits and demerits are discussed.
This paper also analyses the performance of various strategies with respect to the parameters mentioned
above. In particular, this paper focuses on the dynamic replica placement and selection strategies in the
data grid environment.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction: replica placement and replica selection
A computational grid [22] is a combination of both hardware
and software that provides reliable and consistent resources to ex-
ecute a job in distributed environment. Data grid is a distributed
collection of storage and computational resources located in dif-
ferent geographical locations. [1,18,23] describe grid is a flexi-
ble, secure and co-ordinated resource sharing environment for
individuals, institutions and resources. Computationally inten-
sive applications need large amount of data, but maintaining a
local copy of data at every node is very expensive and not practical.
In general, huge volume of data (tera-bytes or peta-bytes) is stored
and managed in data grids. Scientific applications such as high
energy physics [31], data mining, climate simulation and satel-
lite image processing applications produce large amounts of data
∗
Corresponding author.
E-mail addresses: kingsydhas@gmail.com (R. Kingsy Grace),
mmegalai@yahoo.com (R. Manimegalai).
[1–3,19]. Managing and accessing such large amount of data stored
in geographically different locations is slow and tedious due to net-
work constraints. With the growing size of data grid architecture, it
is necessary to increase the availability of data in the grid by using
data replication techniques. Data replication strategies are used to
increase the data availability [7] for execution of jobs in the grid.
They also provide increased fault tolerance, improved scalability,
and reduced response time and bandwidth consumption. Amjad
et al. [7] have presented various dynamic replication strategies in
data grids where as this paper focuses on various replica placement
and selection strategies in data grids.
Two important challenges in data replication techniques are:
(i) replica placement and (ii) replica selection. Replica placement
is the problem of placing duplicate copies of data in the most ap-
propriate node in the data grid. The replica placement or replica-
tion can be logically divided into three stages, namely, replication
decision, replica selection and file replacement [37]. The replica-
tion decision stage decides when and where to create the replica.
If the decision is not to replicate, the file will be read remotely.
The second stage, replica selection, decides which file needs to be
0743-7315/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jpdc.2013.10.009