1942
Cost-Effective Fault-Tolerant Scheduling Algorithm for Real-Time Tasks in Cloud
Systems
Pengze Guo, Zhi Xue
Department of Electronic Engineering
Shanghai Jiao Tong University
Shanghai 200240, China
e-mail: {guopengze, zxue}@sjtu.edu.cn
Abstract—Scheduling tasks in distributed systems is a classic
optimization problem. Cloud computing brings new challenges
for traditional scheduling methods due to its characteristics of
elasticity. Although the task scheduling problem has been
widely studied, there are few heuristics suitable for the cloud
environment. In this paper, we propose a cost-effective fault-
tolerant scheduling algorithm (CEFT) for real-time tasks in
cloud systems. Particle swarm optimization (PSO) is tailored to
address the task assignment issue. Primary/backup (P/B)
approach is applied to provide fault tolerance for tasks in case
of permanent or transient hardware failure. In addition,
rescheduling mechanism is put forward to meet the deadline
constraints of real-time tasks. Simulation experiments are
conducted to evaluate the effectiveness of our algorithm. The
results show that CEFT makes good balance between low cost
and high deadline guarantee ratio.
Keywords-real-time; fault tolerance; primary/backup;
scheduling; particle swarm optimization
I. INTRODUCTION
Cloud computing offers a new paradigm to enable users
to choose computing resources they need [1]. Different from
traditional distributed systems, cloud systems offer elastic
computing and on-demand service. In cloud systems, the
hardware resources are virtualized to computing resource
pool. Users can acquire computing resources from the
resource pool on demand, and adjust the computing
capacities at any time. In this way, the resource utilization is
increased considerably. However, this brings challenges for
traditional scheduling algorithms because the virtual
infrastructure is nearly unlimited and can get scaled easily.
So, the task scheduling algorithm must be more flexible to
adapt to cloud computing.
Scheduling algorithms consider the problem of mapping
tasks into computing instances. Certain criteria should be
met during the scheduling. For real-time tasks, the
correctness of results depends not only on the logicality of
the results but also on the time at which the results are
produced [2]. Therefore, the finish time of each task should
not exceed its deadline. Besides, fault tolerance is an
important requirement to guarantee tasks finish before their
deadlines even in case of hardware failure. Primary/backup
(P/B) approach, which duplicates a task into two copies, is
widely used to provide fault tolerance.
In this paper, we consider the problem of scheduling real-
time tasks in cloud systems while providing fault tolerance.
Tasks are independent, which means no precedence
constraints exist between tasks. Our algorithm optimizes the
resource assignment scheme in an iterative way. In each
iteration, discrete PSO is applied to choose VM for each task.
The elasticity of cloud computing is sufficiently considered
and computing resources are properly selected. CEFT adapts
to cloud computing and adjusts the scheduling scheme
according to the resources selected. CEFT aims to minimize
the execution cost while meeting the deadline constraints.
The remaining part of the paper is organized as follows.
Section 2 reviews some related work about this topic.
Section 3 introduces the fault tolerance model and
scheduling objective. Section 4 gives a brief introduction to
discrete PSO and iteration mechanism. Finally, Section 5
presents the simulation results of the algorithm followed by
the conclusions in Section 6.
II. R
ELATED WORK
Since scheduling tasks in multiprocessor system has been
proved to be NP-hard [3], a large number of heuristic
algorithms for assigning tasks have been developed. Liu and
Layland [4] proposed Rate-Monotonic (RM) algorithm for
preemptively scheduling periodic tasks on a single processor.
Rate-Monotonic First-Fit (RMFF), which extended RM to
multiprocessor systems, was proposed by Dhall and Liu [5].
Rodriguez and Buyya [6] put forward a workflow scheduling
algorithm based on the particle swarm optimization (PSO) to
reduce the execution cost while meeting deadline constraints.
However, these algorithms do not consider fault
tolerance, and thus cannot guarantee reliability for real-time
tasks. Bertossi et al. [7] proposed the Fault-Tolerant Rate-
Monotonic First-Fit (FTRMFF) using primary/backup
scheme to extend the RMFF algorithm. Zhu et al. [8]
proposed a QoS-aware fault-tolerant scheduling algorithm
for tasks in heterogenous systems. But this algorithm belongs
to dynamic scheduling, while we consider the static
scheduling problem. Guo et al. [9] developed a novel real-
time fault-tolerant task allocation algorithm (FTAOA) for
wireless sensor networks. FTAOA optimize the allocation
scheme using binary PSO. However, FTAOA cannot be
applied to the cloud systems because FTAOA fails to
consider the elasticity of cloud computing. Besides, the
2017 17th IEEE International Conference on Communication Technology
978-1-5090-3944-9/17/$31.00 ©2017 IEEE