Aurora：面向监控应用的数据流新型架构

需积分: 31 63 浏览量更新于2024-07-22 收藏 984KB PDF 举报

本文档探讨了Aurora，一个专为监控应用设计的新型数据流管理系统的新模型和架构。在当前信息技术环境中，监控应用程序与传统的商业数据处理存在显著差异。传统业务处理通常依赖于人类操作者的输入，而监控系统则需要实时响应来自多个源头（如传感器）的持续输入，这就对数据库管理系统（DBMS）的基本架构提出了新的挑战。 Aurora的设计目标是提供一种全新的处理模式和架构，以适应这些实时、分布式的数据流环境。作者们，来自布兰迪斯大学、布朗大学和麻省理工学院的计算机科学专家，共同开发了这个系统。他们首先概述了Aurora的基本模型，强调了其核心理念在于将数据流作为第一类处理对象，而非传统的批处理方式。在Aurora的架构中，关键组件包括一系列面向流的运算符，这些运算符专为处理不断变化的数据流而设计。它们可能涉及实时数据清洗、聚合、分析和触发式反应等功能，这些都是为了确保系统能够即时响应并做出决策，这对于诸如工业自动化、物联网(IoT)等场景中的实时监控至关重要。与传统的DBMS相比，Aurora更加强调事件驱动和低延迟，它可能采用分布式计算和内存计算技术来提高处理性能。同时，由于数据源的多样性，Aurora还可能整合了数据集成和数据湖的概念，以便高效地管理和存储来自不同源头的异构数据。此外，论文详细描述了Aurora在数据持久化、故障恢复、以及如何通过并行和分布式处理来优化资源利用等方面的实现策略。安全性也是该系统的一个关键考虑因素，尤其是在处理敏感的监控数据时，需要确保数据的隐私和完整性。 Aurora的出现标志着数据库设计领域的一大进步，它不仅满足了现代监控应用对实时性和灵活性的需求，也为其他需要处理大规模数据流的应用提供了新的参考和借鉴。随着论文的发表，Aurora模型和架构的研究成果将为数据流管理带来新的思考和实践方向。

4 D.J. Abadi et al.: Aurora: a new model and architecture for data stream management

The middle path in Fig. 2 represents a view. In this case,

a path is deﬁned with no connected application. It is allowed

to have a QoS speciﬁcation as an indication of the importance

of the view. Applications can connect to the end of this path

whenever there is a need. Before this happens, the system

can propagate some, all, or none of the values stored at the

connection point in order to reduce latency for applications

that connect later. Moreover, it can store these partial results at

any point along a view path. This is analogous to a materialized

or partially materialized view. View materialization is under

the control of the scheduler.

The bottom path represents an ad hoc query. An ad hoc

query can be attached to a connection point at any time. The

semantics of an ad hoc query is that the system will process

data items and deliver answers from the earliest time T (per-

sistence speciﬁcation) stored in the connection point until the

query branch is explicitly disconnected. Thus, the semantics

for an Aurora ad hoc query is the same as a continuous query

that starts executing at t

now

− T and continues until explicit

termination.

2.2 Graphical user interface

The Aurora user interface cannot be covered in detail because

of space limitations. Here, we mention only a few salient fea-

tures. To facilitate designing large networks, Aurora will sup-

port a hierarchical collection of groups of boxes. A designer

can begin near the top of the hierarchy where only a few super-

boxes are visible on the screen. A zoom capability is provided

to allow him to move into speciﬁc portions of the network,

by replacing a group with its constituent boxes and groups.

In this way, a browsing capability is provided for the Aurora

diagram.

Boxes and groups have a tag, an argument list, a description

of the Functionality, and, ultimately, a manual page. Users can

teleport to speciﬁc places in an Aurora network by querying

these attributes. Additionally, a user can place bookmarks in a

network to allow him to return to places of interest.

These capabilities give an Aurora user a mechanism to

query theAurora diagram. The user interface also allows mon-

itors for arcs in the network to facilitate debugging as well as

facilities for “single stepping” through a sequence of Aurora

boxes. We plan a graphical performance monitor as well as

more sophisticated query capabilities.

3 Aurora optimization

In traditional relational query optimization, one of the primary

objectives is to minimize the number of iterations over large

data sets. Stream-oriented operators that constitute the Aurora

network, on the other hand, are designed to operate in a data

ﬂow mode in which data elements are processed as they appear

on the input. Although the amount of computation required by

an operator to process a new element is usually quite small,

we expect to have a large number of boxes. Furthermore, high

data rates add another dimension to the problem. Lastly, we

expect many changes to be made to an Aurora network over

time, and it seems unreasonable to take the network ofﬂine

to perform a compile time optimization. We now present our

strategies to optimize an Aurora network.

3.1 Dynamic continuous query optimization

We begin execution of an unoptimizedAurora network, i.e., the

one that the user constructed. During execution we gather run-

time statistics such as the average cost of box execution and

box selectivity. Our goal is to perform run-time optimization

of a network, without having to quiesce it. Hence combining

all the boxes into a massive query and then applying conven-

tional query optimization is not a workable approach. Besides

being NP-complete [25], it would require quiescing the whole

network. Instead, the optimizer will select a portion of the net-

work for optimization. Then it will ﬁnd all connection points

that surround the subnetwork to be optimized. It will hold all

input messages at upstream connection points and drain the

subnetwork of messages through all downstream connection

points. The optimizer will then apply the following local tac-

tics to the identiﬁed subnetwork.

• Inserting projections. It is unlikely that the application ad-

ministrator will have inserted map operators (see Sect. 5)

to project out all unneeded attributes. Examination of an

Aurora network allows us to insert or move such map oper-

ations to the earliest possible points in the network, thereby

shrinking the size of the tuples that must be subsequently

processed. Note that this kind of optimization requires that

the system be provided with operator signatures that de-

scribe the attributes that are used and produced by the

operators.

• Combining boxes. As a next step, Aurora diagrams will be

processed to combine boxes where possible. A pairwise

examination of the operators suggests that, in general, map

and ﬁlter can be combined with almost all of the operators,

whereas windowed or binary operators cannot.

It is desirable to combine two boxes into a single box when

this leads to some cost reduction. As an example, a map

operator that only projects out attributes can be combined

easily with any adjacent operator, thereby saving the box-

execution overhead for a very cheap operator. In addition,

two ﬁltering operations can be combined into a single,

more complex ﬁlter that can be more efﬁciently executed

than the two boxes it replaces. Not only is the overhead of a

second box activation avoided, but also standard relational

optimization on one-table predicates can be applied in the

larger box. In general, combining boxes at least saves the

box-execution overhead and reduces the total number of

boxes, leading to a simpler diagram.

• Reordering boxes. Reordering the operations in a conven-

tional relational DBMS to an equivalent but more efﬁcient

form is a common technique in query optimization. For

example, ﬁlter operations can sometimes be pushed down

the query tree through joins. In Aurora, we can apply the

same technique when two operations commute.

To decide when to interchange two commutative operators,

we make use of the following performance model. Each Au-

rora box, b, has a cost, c(b), deﬁned as the expected execution

time for b to process one input tuple. Additionally, each box

has a selectivity, s(b), which is the expected number of output

tuples per input tuple. Consider two boxes, b

and b

, with b

following b

. In this case, for each input tuple for b

we can

compute the amount of processing as c(b

)+c(b

) × s(b

Reversing the operators gives a like calculation. Hence we

剩余19页未读，继续阅读

Quantum_bit

粉丝: 2
资源: 39

Aurora：面向监控应用的数据流新型架构

stream大全

Advanced Data Management: Aurora and Stream Processing

Aurora: New Tab & Bookmark Manager-crx插件

hexo-theme-aurora::rainbow:‍:rainbow:未来派极光Hexo主题

Aurora:实时运输跟踪

aurora:天气信息应用

Aurora:聊天机器人客户端

nix-aurora:Nis on Mesos上的Aurora

aurora:Aurora 是 Java Scala 用于分片 IO 的库

generator-aurora:用于 Aurora Drupal 主题的 Yeoman 生成器

最新资源