4 J. Comput. Sci. & Technol., Jan. 2017, Vol.32, No.1
{98, 97}, 98 −98 6 2, and 97 −96 6 2, we call these re-
sults as acceptable approximate results. When the win-
dow slides to W
3
, i.e., the objects are updated to {88,
93, 85, 77, 60, 82, 73, 70, 48, 60, 71, 66}, our algorithm
outputs {93, 88}. They are also acceptable approximate
results. When the window slides to W
4
, the objects are
updated to {77, 60, 82, 73, 70, 48, 60, 71, 66, 65.5, 54,
70}, and our algorithm outputs {73, 71}. Since the ex-
act results are {82, 77}, in which 82−73 > 2, we regard
73 as an unacceptable approximate result.
Assuming the window slides from W
0
to W
n−1
, our
proposed TAHM algorithm outputs 2 × n results dur-
ing this process. For each approximate result o
ij
, given
Pr(|F (r
ij
) − F (o
ij
)| 6 2) > 0.99, the expected num-
ber of unacceptable approximate results among these
results is (1 − δ) × n × k = 0.02n. Here, F (r
ij
) is the
j-th highest score in the approximate result set of W
i
,
and F (o
ij
) is the j-th highest score in W
i
.
3 Framework Overview
As has been discussed, the main challenges of an-
swering approximate top-k query are: 1) for each newly
arrived object o, the algorithm should efficiently prune
it if o is impossible to become a query result; 2) if o is
not pruned, it should be inserted into candidate set with
low computation cost. In this paper, we tackle the chal-
lenges by designing an efficient framework, named PABF
(Probabilistic Approximation Based Framework) for
supporting approximate continuous top-k query over
sliding window.
As shown in Algorithm 1, PABF mainly consists of
the following four modules: Filter, Local-Merge, Global-
Merge, and TA-Heap. Filter here is used for pruning newly
arrived objects (lines 3∼7). To be more specific, for
each object o ∈ s
n
, we determine whether it is a query
result (or may become a result for a future query win-
dow with a probability of at least 1 − δ). If so, o is
selected as a candidate, and inserted into a temporary
buffer M; otherwise, o is ignored. Compared with other
one-pass algorithms, one advantage of our algorithm is
that it can directly prune O(s − k) objects in s
n
with
the help of a suitable pruning value.
After scanning s
n
, we merge candidates in M with
candidate set B. To speed up candidate maintenance,
we partition B into a group of buckets. Formally,
given the candidate set B, a partition P(B, m) =
{b
1
, b
2
, . . . , b
m
} is to partition the elements in B into
m buckets {b
1
, b
2
, . . . , b
m
} such that 1) 0 < |b
i
| <
(φ)
m−i+1
k; 2) ∀o ∈ b
i
, o
′
∈ b
i−1
, in which T (o) should
be larger than T (o
′
); 3) objects in each bucket are
sorted. T (o) here refers to the arrival order of o, and φ
is a coefficient whose optimal value will be studied in
Section 5.
Algorithm 1:1. PABF Framework
Input: query window W , current result set R and
candidate set B
Output: updated candidate set B, and updated
result set R
1 Window maintenance: W ← W − s
0
, W ← W ∪ s
n
;
2 for i from 0 to s − 1 do
3 Bo ol bP runed ← Filter (s[i]);
4 if bP runed = false then
5 M ← M ∪ s[i];
6 else
7 Ignore s[i];
8 for i from 0 to |M| − 1 do
9 Lo cal-Merge (s[i], b
m
);
10 if |b
m
| > φk then
11 Int i ← m − 1, Bool bM erge ← true;
12 while bMerge do
13 Golbal-Merge (b
i
, b
i+1
);
14 if |b
i
| > φ
m−i+1
k then
15 bMerge ← false;
16 R ← TA-Heap (B, M);
17 Return;
Based on the partition result, the merge operation
can be divided into two phases: Local-Merge and Global-
Merge. In the Local-Merge phase, we sort candidates ac-
cording to their scores via merge sort. We then try to
combine candidates in M with the ones in b
m
together,
if their scores are roughly the same. Besides, we also
maintain the dominate number for each element in b
m
,
and delete meaningless ones. Through the above ope-
rations, the size of b
m
can be effectively reduced. Since
the overhead of merge cost mainly depends on the size
of b
m
, a smaller size of b
m
helps us further reduce the
cost of merge. After Local-Merge, if |b
m
| > φk, our algo-
rithm enters the Global-Merge phase that further merges
b
m
with b
m−1
(lines 10∼15). The following operations
are repeated until the condition |b
i
| < φ
m−i+1
k is sat-
isfied. Note that in this phase, we could further reduce
the cost of candidates maintenance by computing the
optimal partition.
Now we demonstrate how PABF supports the query.
We maintain k
′
(k < k
′
< 2k) objects with the high-
est scores in set R. When a query result leaves the
window, we retrieve the new results from R if k < k
′
.
Otherwise, we directly retrieve new answers from B,