Workflow Execution Plan Generation in the Cloud Computing Environment Based
on an Improved List Scheduling Algorithm
Xiaoying WANG, Chengshui NIU, Yu-an ZHANG
Department of Computer Science and Technology
Qinghai University
Xining, Qinghai, China, 810016
E-mail: {wxy_cta, csniu, yazhang}@qhu.edu.cn
Lei ZHANG
College of Computer Science
Sichuan University
Chengdu, China, 610064
E-mail: zhanglei@scu.edu.cn
Abstract—Focusing on the higher ratio of processor utilization
and lower execution cost of a scientific workflow in the cloud
environment, an improved list scheduling algorithm was
proposed in this paper. This algorithm combines the ideas of
both list scheduling and task duplication. According to the
priority of the tasks, choosing reasonable parent task to
replicate can help reduce the overhead between tasks. To
properly insert tasks during processor idling time can help to
increase the processor utilization. Based on these, we proposed
an improved strategy to generate the workflow execution plan,
called EPGILS. Experiment results show that the algorithm is
feasible and efficient in reducing the task completion time and
improving the utilization ratio of the processor.
Keywords-scientific workflow; cloud computing environment;
list scheduling; task scheduling; execution plan
I. INTRODUCTION
Scientific workflow (SWF) is a new type of application
developed rapidly in recent years. It can support scientists
and researchers to integrate, construct and cooperate various
distributed data services and software tools, providing a
management platform for complex workflow definition and
execution automation of scientific computations [1].
Compared to traditional Business Workflow (BWF), one of
the most important features of the scientific workflow is
data-oriented. Nowadays, SWF is becoming computation-
intensive and data-intensive [2], since the data are generated
continuously and fast and the relevant computation becomes
complex accordingly. Normal computing environment can
hardly meet the requirement of SWF. Hence, cloud
computing environment provides a new deployment and
execution paradigm for scientific applications, since its
infrastructure is usually comprised of high performance
computing resources and massive storage resources.
In the cloud computing environment, customers can rent
the resources on demand. However, changing the resource
allocation scheme will involve the creation of instances and
data movements, which incurs costs and might have impact
on the workflow execution efficiency and total costs [3].
Thus, it’s very important to design reasonable workflow
execution plans for both users and service providers [4]. In
other words, generating a plan means to map the tasks in the
workflow onto the computing resources appropriately.
Hence, in this paper, we propose a task scheduling
algorithm, named EPGILS (Execution Plan Generation based
on Improved List Scheduling), for workflow execution plan
generation based on the list scheduling algorithm. The
motivation is to utilize the idle time of processors efficiently,
thereby reducing the number of necessary processors
involved in task execution. By EPGILS, the advantages of
list scheduling and task duplication are combined to generate
the execution plan of scientific workflow tasks upon the
homogeneous cloud infrastructure. Experiment results show
that this strategy can effectively enhance the parallelism of
the workflow execution. As a result, not only the total spent
time could be reduced, but also the idle processor time slices
could be utilized sufficiently, and thus the total execution
cost for running these tasks could be lowered.
II. R
ELATED WORK
The key issue of workflow execution plan generation is
how to schedule a task onto a proper computing resource [5].
Most algorithms proposed in prior work are heuristic,
including clustering [6], task duplication [7], list scheduling
[8] and GA (Genetic Algorithm)-based algorithms [9].
Clustering-based algorithm tends to map a cluster to a virtual
machine (VM), but doesn’t consider the task duplication
problem among multiple clusters. Task duplication algorithm
will replicate tasks from one VM to another in order to
reduce the communication overhead of different tasks, but
the target server is usually hard to choose. Genetic algorithm
can lead to close-to-optimal convergence time by selection,
crossover and mutation, but the next task to-be-scheduled
can hardly be pre-determined due to the randomness during
the selection process. Also, the crossover and mutation
process usually consumes much computation time [10]. List
scheduling algorithm determines the priority of each task
before it’s executed so that the entire task queue could be
given before execution. Heuristic algorithms were widely-
used by many researchers since they are often robust and
cost-efficient.
This paper proposes an improved algorithm EPGILS
based on the list scheduling algorithm and the task
duplication based algorithm. In a homogenous cloud
2017 International Conference on Computing Intelligence and Information System
978-1-5386-3886-6/17 $31.00 © 2017 IEEE
DOI 10.1109/CIIS.2017.59
231