A Survey on the Evolution of Stream Processing Systems 7
Operator
clock t=4
slack=1
(a) Slack
Input Manager
Heartbeat: t=4
(b) Heartbeat
Operator
clock (low-
watermark): t=4
(c) Low-watermark
Fig. 3: Mechanisms for managing disorder.
tion is itself a stream tuple, which consists of a set of patterns
each identifying an attribute of a stream data tuple. A punc-
tuation is a generic mechanism that communicates informa-
tion across the dataflow graph. Regarding progress tracking,
it provides a channel for communicating progress informa-
tion such as a tuple attribute’s low-watermark produced by
an operator [96], event time skew [121], or slack [9]. Thus,
punctuations can convey which data cease to appear in an
input stream; for instance the data tuples with smaller times-
tamp than a specific value. Punctuations are useful in other
functional areas of a streaming system as well, such as state
management, monitoring, and flow control.
Figure 3 showcases the differences between slack, heart-
beats, and low-watermarks. The figure depicts a simple ag-
gregation operator that counts tuples in 4-second event time
tumbling windows. The operator awaits for some indication
that event time has advanced past the end timestamp of a
window so that it computes and outputs an aggregate per
window. The indication varies according to the progress-
tracking mechanism. The input to this operator are seven tu-
ples containing only a timestamp from t=1 to t=7. The times-
tamp signifies the event time in seconds that the tuple was
produced in the input source. Each tuple contains a differ-
ent timestamp and all tuples are dispatched from a source in
ascending order of timestamp. Due to network latency, the
tuples may arrive to the streaming system out of order.
Figure 3a presents the slack mechanism. In order to ac-
commodate out-of-order tuples the operator admits out-of-
order tuples up to slack=1. Thus, the operator having admit-
ted tuples with t=1 and t=2 not depicted in the figure will
receive tuple with t=4. The timestamp of the tuple coincides
with the max timestamp of the first window for interval [0,
4). Normally, this tuple would cause the operator to close the
window and compute and output the aggregate, but because
of the slack value the operator will wait to receive one more
tuple. The next tuple t=3 belongs to the first window and is
included there. At this point, slack also expires and this event
finally triggers the window computation, which outputs C=3
for t=[1, 2, 3]. On the contrary, the operator will not accept
t=5 at the tail of input because it arrives two tuples after its
natural order and is not covered by the slack value.
Figure 3b depicts the heartbeat mechanism. An input
manager buffers and orders the incoming tuples by times-
tamp. The number of tuples buffered, two in this example
(t=5, t=6), is of no importance. The source periodically
sends a heartbeat to the input manager, i.e. a signal with a
timestamp. Then the input manager dispatches to the opera-
tor all tuples with timestamp less or equal to the timestamp
of the heartbeat in ascending order. For instance, when the
heartbeat with timestamp t=2 arrives in the input manager
(not shown in the figure), the input manager dispatches the
tuples with timestamp t=1 and t=2 to the operator. The input
manager then receives tuples with t=4, t=6, and t=5 in this
order and puts them in the right order. When the heartbeat
with timestamp t=4 arrives, the input manager dispatches
the tuple with timestamp t=4 to the operator. This tuple trig-
gers the computation of the first window for interval [0, 4).
The operator outputs C=2 counting two tuples with t=[1, 2]
not depicted in the figure. The input manager ignores the in-
coming tuple with timestamp t=3 as it is older than the latest
heartbeat with timestamp t=4.
Figure 3c presents the low-watermark mechanism, which
signifies the oldest pending work in the system. Here punc-
tuations carrying the low-watermark timestamp decide when
windows will be closed and computed. After receiving two
tuples with t=1 and t=2, the corresponding low-watermark
for t=2 (which is propagated downstream), and tuple t=3,
the operator receives tuple t=5. Since this tuple carries an
event time timestamp greater or equal to 4, which is the end
timestamp of the first window, it could be the one to cause
the window to fire or close. However, this approach would