Hong-Liang Li et al.: Stream Processing Task Allocation with Recovery Latency Constraint 1127
• We conduct ex tensive simulations to verify the
correctnes s and effectiveness of our approach with diffe-
rent applications and setups.
This pap er further explores the RTAP problem
based on our earlier conferenc e version
[20]
. We propo se
an efficient approach to solve the problem and provide
extensive experimental results and analys is of differe nc e
approaches. The remaining of the paper is organized as
follows. In Section 2, we summarize related work. Sec-
tion 3 prese nts the problem model and analysis . We
propose our approach in Section 4 and Section 5 dis-
cusses the expe rimental results. Finally, Section 6 con-
cludes the paper.
2 Related Work
2.1 Task Allocation for Stream Topology
A stream topolo gy is us ually modeled as a directed
acyclic graph (DAG) G(V, A) of tasks (V ) and directed
connections (A). The task allocation problem is one
of the fundamental issues of stream processing systems
that alloca te resources for each task according to its
resource requirement, avoiding either performance bo t-
tleneck (under-provisioning) or the waste of resources
(over-provisioning). Earlier work focuses on the mod-
eling of task resource requirements and the relation-
ship between assigned resources and processing perfor-
mances (throughput and latency)
[2,12,28]
. The resource
requirement of each task, hereafter referred to as the
weight of a task, represents the share of resource (com-
putational, memory, and/o r bandwidth capacity) that
is required to ensure the proces sing performance ac-
cording to its input speed. E idenbenz and Locher
[9]
gave a theoretical analysis of this problem and proved
its NP-hardness. T he y pr op osed an approach to com-
pute optimal resource assignments for each task in a
given stream topology when the stream topology is a
series-pa rallel de comp osable graph.
Assuming the resource requirements of each task
are given as the input, other studies
[13,14,29]
investi-
gated the problem of allocating resources for tasks from
available resource pools. Chatzis tergiou and Viglas
[14]
presented a fast heuristic algorithm considering both
computational and bandwidth resource requirements
and used throughput as the performance metric. Re-
cent work focuses on enhancing the processing latency
for both static
[13]
and dynamic
[29]
task weights.
Most of these studies formulize the task allocation
problem based on the bin packing problem (BPP),
which is a well-studied combinatorial optimiza tion
problem. We discuss related models and approaches of
BPP in Subsection 2.4. Related work has been focus-
ing on task allocation problem in a failure-free sce nario
that does not take failures effects into account.
2.2 Reliable Stream Processing
Active replication and checkpoint/recovery are two
traditional FT mechanisms that have been w ide ly stu-
died in distributed systems
[30]
. They both have appli-
cations in distributed stream processing systems. Ac-
tive replication maintains at least one active replica in-
stance to enable instant switches from its primary in-
stance to its replication when failur e occurs
[31]
. This
ensures minimum response time but suffers from a
high overhead, at least doubling resource consump-
tions. It is applied in ear lie r stream processing sys-
tems or data engines
[1,26]
that are hosted by a cluster
of a small number of machines. With the application
scales increasing rapidly
[8]
, the active replication model
becomes inefficient or even impractica l to distributed
stream pr ocessing systems (DSPS)
[6,11]
, which is why
most recent researches explore FT approaches based on
checkpoint/recovery
[5,25,27,32]
.
Hwang et al.
[11]
introduced a n upstream backup
model that takes advantage of the close upstream-
downstream dependencies. Upstream tasks keep output
buffers as backups for downstream tasks. If a down-
stream task fails, the backup data is replayed to gene-
rate corre ct results. It is an efficient approach for the
stream processing model but only supports a pplications
that depend o n recent data rather than support those
that depend on the complete history of previous data.
Therefore, recent work improves the upstream ba ckup
with the c ombination of checkpoint/recovery to solve
this problem
[22,25,27,33]
, which becomes the most com-
monly used FT method for SPM.
2.3 Processing/Recovery Latency Modeling
Chain
[11]
is one of the earlies t researches that stu-
died the processing latency model and task allocation
strategies. It presents a s olution for minimizing the
makespan of a stream processing job in a single pro-
cessor. In recent years, the task allocation problem
for DSPS has been widely studied
[10,13,14]
. These stu-
dies use similar proces sing latency models and stream
topology mode ls, which provide the background for our
work. Eidenbenz and Locher
[10]
presented s trong the-
oretical results for a common type of stream topology