保障恢复延迟的流处理任务分配方法

168 浏览量更新于2024-07-15 收藏 2.06MB PDF 举报

本文献探讨了一种针对实时流处理任务分配的方法，特别关注恢复延迟约束（Recovery Latency Constraint）。作者是来自吉林大学、美国Temple University和West Chester University of Pennsylvania的科研团队，包括Hong-Liang Li（IEEE会士，CCF会员）、Jie Wu（IEEE会士，Fellow）等学者。他们共同发表于2018年《计算机科学技术学报》（Journal of Computer Science and Technology），论文编号为33(6)，页码1125-1139。在当前的数据驱动世界中，实时流处理应用对于高效处理大规模在线数据至关重要。这些应用通常需要在保证服务质量的同时，确保在出现故障或数据丢失时能迅速恢复，这就提出了对恢复延迟的严格控制。论文的核心贡献在于提出了一种任务分配策略，旨在优化任务调度，降低恢复过程中的延迟，从而提升系统的整体性能和稳定性。该方法主要关注以下几个关键点： 1. **问题定义**：研究者首先定义了在实时流处理环境中，如何在满足恢复时间目标的同时，有效地平衡任务分配和处理负载。这涉及到如何在众多并发任务中合理地划分工作，确保每个任务能够快速响应并从故障中恢复。 2. **算法设计**：论文介绍了一种新的任务分配算法，它可能采用了动态规划、贪心策略或者自适应调整机制，以根据实时数据流特性动态调整任务分配。这种算法可能考虑了任务的复杂性、计算资源需求、网络延迟等因素，以最小化恢复时间。 3. **性能评估**：通过理论分析和实验验证，作者展示了新方法相对于传统策略的优势，比如减少了平均恢复时间、提高了系统吞吐量，以及提升了整体的资源利用率。这可能通过对比不同场景下的恢复延迟、任务完成时间和系统可用性指标来衡量。 4. **适用场景**：论文的应用场景可能涵盖了诸如物联网、金融交易监控、社交媒体分析等需要实时处理并快速恢复的业务领域。对于这些场景，提供低恢复延迟的任务分配策略至关重要。 5. **研究挑战与未来方向**：尽管已经取得了一定成果，文章可能还讨论了面临的技术挑战，如动态环境中的不确定性、扩展性和复杂性，以及可能的未来研究方向，如集成更多恢复策略或在分布式环境下优化任务分配。这篇研究论文为实时流处理系统的任务分配提供了一个有前景的解决方案，通过确保恢复延迟，有助于提高系统的可用性和用户体验。这对于不断增长的数据密集型应用来说，是一项重要的技术突破。

Hong-Liang Li et al.: Stream Processing Task Allocation with Recovery Latency Constraint 1127

• We conduct ex tensive simulations to verify the

correctnes s and eﬀectiveness of our approach with diﬀe-

rent applications and setups.

This pap er further explores the RTAP problem

based on our earlier conferenc e version

[20]

. We propo se

an eﬃcient approach to solve the problem and provide

extensive experimental results and analys is of diﬀere nc e

approaches. The remaining of the paper is organized as

follows. In Section 2, we summarize related work. Sec-

tion 3 prese nts the problem model and analysis . We

propose our approach in Section 4 and Section 5 dis-

cusses the expe rimental results. Finally, Section 6 con-

cludes the paper.

2 Related Work

2.1 Task Allocation for Stream Topology

A stream topolo gy is us ually modeled as a directed

acyclic graph (DAG) G(V, A) of tasks (V ) and directed

connections (A). The task allocation problem is one

of the fundamental issues of stream processing systems

that alloca te resources for each task according to its

resource requirement, avoiding either performance bo t-

tleneck (under-provisioning) or the waste of resources

(over-provisioning). Earlier work focuses on the mod-

eling of task resource requirements and the relation-

ship between assigned resources and processing perfor-

mances (throughput and latency)

[2,12,28]

. The resource

requirement of each task, hereafter referred to as the

weight of a task, represents the share of resource (com-

putational, memory, and/o r bandwidth capacity) that

is required to ensure the proces sing performance ac-

cording to its input speed. E idenbenz and Locher

[9]

gave a theoretical analysis of this problem and proved

its NP-hardness. T he y pr op osed an approach to com-

pute optimal resource assignments for each task in a

given stream topology when the stream topology is a

series-pa rallel de comp osable graph.

Assuming the resource requirements of each task

are given as the input, other studies

[13,14,29]

investi-

gated the problem of allocating resources for tasks from

available resource pools. Chatzis tergiou and Viglas

[14]

presented a fast heuristic algorithm considering both

computational and bandwidth resource requirements

and used throughput as the performance metric. Re-

cent work focuses on enhancing the processing latency

for both static

[13]

and dynamic

[29]

task weights.

Most of these studies formulize the task allocation

problem based on the bin packing problem (BPP),

which is a well-studied combinatorial optimiza tion

problem. We discuss related models and approaches of

BPP in Subsection 2.4. Related work has been focus-

ing on task allocation problem in a failure-free sce nario

that does not take failures eﬀects into account.

2.2 Reliable Stream Processing

Active replication and checkpoint/recovery are two

traditional FT mechanisms that have been w ide ly stu-

died in distributed systems

[30]

. They both have appli-

cations in distributed stream processing systems. Ac-

tive replication maintains at least one active replica in-

stance to enable instant switches from its primary in-

stance to its replication when failur e occurs

[31]

. This

ensures minimum response time but suﬀers from a

high overhead, at least doubling resource consump-

tions. It is applied in ear lie r stream processing sys-

tems or data engines

[1,26]

that are hosted by a cluster

of a small number of machines. With the application

scales increasing rapidly

[8]

, the active replication model

becomes ineﬃcient or even impractica l to distributed

stream pr ocessing systems (DSPS)

[6,11]

, which is why

most recent researches explore FT approaches based on

checkpoint/recovery

[5,25,27,32]

Hwang et al.

[11]

introduced a n upstream backup

model that takes advantage of the close upstream-

downstream dependencies. Upstream tasks keep output

buﬀers as backups for downstream tasks. If a down-

stream task fails, the backup data is replayed to gene-

rate corre ct results. It is an eﬃcient approach for the

stream processing model but only supports a pplications

that depend o n recent data rather than support those

that depend on the complete history of previous data.

Therefore, recent work improves the upstream ba ckup

with the c ombination of checkpoint/recovery to solve

this problem

[22,25,27,33]

, which becomes the most com-

monly used FT method for SPM.

2.3 Processing/Recovery Latency Modeling

Chain

[11]

is one of the earlies t researches that stu-

died the processing latency model and task allocation

strategies. It presents a s olution for minimizing the

makespan of a stream processing job in a single pro-

cessor. In recent years, the task allocation problem

for DSPS has been widely studied

[10,13,14]

. These stu-

dies use similar proces sing latency models and stream

topology mode ls, which provide the background for our

work. Eidenbenz and Locher

[10]

presented s trong the-

oretical results for a common type of stream topology

剩余14页未读，继续阅读

weixin_38706531

粉丝: 3
资源: 945

保障恢复延迟的流处理任务分配方法

Ant Colony Optimization for Task Allocation in Multi-Agent Systems

Bandwidth Allocation Method by Service for WDM EPON

The Rich Get Richer: Preferential Attachment in the Task Allocation of Cooperative Networked Multiagent Systems with Resource Caching

分布式自治机器人的不确定性感知任务分配_Uncertainty-Aware Task Allocation for Distri

A Buffer Allocation Algorithm for Network-on-Chip with Self-Similar Traffic

A dynamic replica allocation method based on database migration in broadband networks

Adaptive power allocation with quality-of-service guarantee

Multi-robot task allocation based on improved market mechanism and anxiety conception

Energy-efficient resource allocation for OFDMA networks with sleep mode

aamas-2015-efficient:重现AAMAS 2015“Efficient Inter-Team Task Allocation in RoboCup Rescue”论文实验的信息

最新资源