Compared with other fields, CIA algorithms have not usually
been applied in DGRM [14,27–29], especially in replica creation
[14]. Among the above algorithms, QEA [30–32] is a novel CIA algo-
rithm developed in recent years, which combines the advantages
of both evolutionary and quantum computing. By adopting qubit
chromosome as a representation, the QEA can represent a linear
superposition of solutions due to its probability characteristics.
Compared with other CIA algorithms such as GA and SA, the QEA
has rapid convergence and good global search compatibility and
is a research hot spot in recent years [31–39]. However, there is
no related literature made for the application of QEA in repilca cre-
ation of data grid.
Based on the QEA, we proposed a method of applying QEA in
data grid replica creation. The optimization model expression is
provided and the procedure of replica creation is proposed.
The remainder of this paper is organized as follows. Section 2
reviews the related work. In Section 3, QEA is addressed. In Sec-
tion 4, a QEA-based replica creation strategy is proposed and dis-
cussed in details. In Section 5, simulation experiments are carried
out to demonstrate the performance of our proposed strategies.
Conclusions and future work are drawn in Section 6.
2. Related works
2.1. Principles of DGRM
Data grid mainly consists of grid nodes and network links,
which can be described as a 2-tuple (V,E), where V is a node set
and E is a link set. By expanding concepts of data grid, DGRM
can be abstracted as a 4-tuple (V, E,R,O), where V and E remain
the same meanings. R represents a replica set. It must be empha-
sized that master data are also treated as replica. O denotes an
operation set. These four elements of DGRM can be further defined
as follows.
V can be abstracted as a 4-tuple (V
c
,V
s
,V
j
,V
r
), where V
c
is the set
of its computing elements, V
s
is the set of storage elements, V
j
is
the set of jobs assigned to it, and V
r
is the set of replicas residing it.
E can be donated by a 3-tuple (V
i
,V
j
,C
i, j
), where V
i
and V
j
are
two endpoints of edges, and C
i,j
represents the transferring cost
per unit data between them, which affected by network band-
width, disk throughput and so on.
R can be represented by a 3-tuple (logName, phyName,size),
where logName and phyName denote DLN and DPN, respectively,
and size is the amount of data.
O can be described by a 5-tuple (O
creation
, O
location
O
selection
,
O
deletion
, O
consistency
), which defines all the functions of DGRM in
sequence of replica creation, location, selection, deletion and
consistency.
The objectives of DGRM are to manipulate replicas in R by
the operations defined in O to meet the data accessing require-
ments of jobs assigned to grid nodes in set V as well as reduce
the network traffic over E, shorten the job execution time, im-
prove the data availability and increase the resource utilization.
The practical operation process of DGRM can be depicted by
Fig. 1.
Firstly, the job scheduling module assigns jobs to nodes in V,
then the nodes analyze the requiring replicas of V
j
itself in R. Sec-
ondly, the nodes check whether those replicas storied at V
s
, if ex-
ists, local processes are made to finish jobs, otherwise the
operations in O are used to obtain the best replicas. If necessary,
some new replicas are created by O
selection
. Meanwhile, O
consistency
are called to assure that all the replicas of each data are same ex-
cept that data are read-only. O
location
maps DLN to DPN,whereas
O
deletion
aims to remove the existing replicas when the available
storage space of V
s
is inadequate for new replicas.
2.2. Replica creation strategies
Generally, replica creation strategies include two types: static
and dynamic strategies. The static strategies need to obtain some
correlative information in advance and place replicas before jobs
are executed, whereas the dynamic strategies replicate data
according to the information collected at the job execution time.
Compared to the static strategies, the dynamic strategies can flex-
ibly assign replicas during jobs executions and be well adapted to
different environments. However, it can also prolong the job re-
sponse time as replica creation could increase the job waiting time.
Both two strategies are involved in data grid replica creation. And
they are often used simultaneously in the process of present strat-
egies. For this reason, in our later discussion, we will not distin-
guish whether the strategies are static or dynamic. In this paper,
however, the existing replica creation strategies are classified into
traditional and CIA-based strategies depending on what optimiza-
tion technologies they use.
2.2.1. Traditional replica creation strategies
Many replica creation strategies were proposed at the initial
time when the DGRM was developed.
Early in 2001, Ranganathan and Foster [3] proposed six replica-
tion Strategies. Later in 2003, Bell et al. [4] introduced the eco-
nomic principles into data grid replica creation and put forward
a novel replica creation strategy based on Economic Model (EM)
to reduce the job execution time. Meanwhile, some EM-based rep-
lica creation strategies were implemented in OptorSim [5,6]
simulator.
In 2006, Rahman et al. [7] proposed a static replica placement
algorithm that placed replicas to nodes by optimizing average re-
sponse time and a dynamic replica maintenance algorithm that
reallocated replicas to new nodes if performance reduced over last
k time periods. Tang et al. [8] put forward two replica creation
strategies: Centralized Dynamic Strategy (CDR) and Distributed
Dynamic Strategy (DDR), which can minimize the data access time
and network load in combination with Shortest Turnaround Time
(STT) scheduling algorithms.
In 2008, Wu et al. [9] discussed the problem to choose the rep-
lica placement nodes. Lei et al. [10] analyzed the data availability
under the environment of file loss and bit loss when the storage
capacity for replicas was constrained. Considering the user re-
source priority and QoS, Lin et al. [11] proposed a novel data grid
replica creation strategy based on priority list which can effectively
balance the replica work load.
In 2010, Sashi and Thanamani [12] proposed a novel dynamic
replica creation strategy based on the popularity of files. Later in
2011, Mansouri and Dastghaibyfard [13] put forward an improved
layered replica creation strategy, the experiment results showed
the proposed strategy outperformed over current strategies about
14%.
2.2.2. CIA-based replica creation strategies
Compared to the traditional strategies, CIA based replica crea-
tion strategies usually adopt CIA as the main optimization algo-
rithm. Although current research on CIA-based replica creation
strategies is not so much, concerns on data grid replica creation
and other aspects of DGRM are increasing.
In 2009, Naseera and Murthy [14] put forward an agent-based
replica placement algorithm. Agents are deployed at each data
node to determine the candidate node for the placement of replica.
In 2010, Zhang et al. [15] presented a replication approach based
on swarm intelligence, which was an adaptive and decentralized
bottom-to-up method. Their simulation results have shown that
the method performs better than no replication. And it outper-
forms EM for big number of jobs.
86 T. Ma et al. / Knowledge-Based Systems 42 (2013) 85–96