Algorithm 2-phase track join: process
T
− R to S
T
R|S
← {}
while any process
R
or any process
S
n
R|S
sends do
for all <key
R|S
k> from n
R|S
do
T
R|S
← T
R|S
+ <k, n
R|S
>
end for
end while
barrier
for all distinct key
R|S
k in T
R|S
do
for all <k, process
R
n
R
> in T
R|S
do
for all <k, process
S
n
S
> in T
R|S
do
send <k, n
S
> to n
R
end for
end for
end for
In the second phase, we only transfer tuples from one ta-
ble. Assuming we transfer R tuples, node T (process
T
) sends
messages to each location with matching R tuples, including
the key and the set of S tuples’ locations. Finally, R tuples
are selectively broadcast to the tracked S locations, instead
of all nodes in the network, and are joined locally.
Choosing whether to send R tuples to S tuple locations,
or S tuples to R locations, has to be decided by the query
optimizer before the query starts executing, similar to the
traditional inner–outer relation distinction of hash join.
2-phase track join transfers payloads from one table only.
If the input tables have mostly unique keys and the selectiv-
ity is high, the cost comprises of tracking plus min(|R|, |S|).
Algorithm 3-phase track join: process
T
barrier
T
R|S
← {}
while any process
R
or any process
S
n
R|S
sends do
for all <key
R|S
k, count c> from n
R|S
do
T
R|S
← T
R|S
+ <k, n
R|S
, c>
end for
end while
barrier
for all distinct key
R|S
k in T
R|S
do
R, S ← {}, {}
for all k, process
R
n
R
, count c> in T
R|S
do
R ← R + <n
R
, c · width
R
>
end for
for all k, process
S
n
S
, count c> in T
R|S
do
S ← S + <n
S
, c · width
S
>
end for
RS
cost
← broadcast R to S
SR
cost
← broadcast S to R
if RS
cost
< SR
cost
then
for all <k, process
R
n
R
> in T
R|S
do
for all <k, process
S
n
S
> in T
R|S
do
send <k, n
S
> to n
R
end for
end for
else
for all <k, process
S
n
S
> in T
R|S
do
for all <k, process
R
n
R
> in T
R|S
do
send <k, n
R
> to n
S
end for
end for
end if
end for
Algorithm 3-phase track join: process
S
... symmetric with process
R
of 3-phase track join ...
Algorithm 3-phase track join: process
R
T
R
← {}
for all <key
R|S
k, payload
R
p
R
> in table R do
T
R
← T
R
+ <k, p
R
>
end for
barrier
for all distinct key
R|S
k in T
R
do
c ← |k in T
R
|
send <k, c> to process
T
(hash(k) mod N)
end for
barrier
while any process
S
or any process
T
sends do
if source is process
T
n
T
then
for all <key
R|S
k, process
S
n
S
> from n
T
do
for all <k, payload
R
p
R
> in T
R
do
send <k, p
R
> to n
S
end for
end for
else if source is process
S
n
S
then
for all <key
R|S
k, payload
S
p
S
> from n
S
do
for all <k, payload
R
p
R
> in T
R
do
commit <k, p
R
, p
S
>
end for
end for
end if
end while
2.2 3-Phase Track Join
In the 3-phase (or double broadcast) track join, we can
decide whether to broadcast R tuples to the locations of S
tuples, or vice versa. The decision is taken per distinct join
key. In order to decide which selective broadcast direction
is cheaper, we need to know how many tuples will be trans-
ferred in each case. Thus, instead of only tracking nodes
with at least one matching tuple, we also track the number
of matches. To generalize for variable lengths, we transfer
the sum of matching tuple widths, rather than a count.
Bi-directional selective broadcast can distinguish cases in
which moving S tuples would transfer fewer bytes than mov-
ing R tuples. The decision whether to selectively broadcast
R → S or vice versa is taken for each distinct key indepen-
dently and we do not rely on the optimizer to pick the least
expensive direction overall for the entire join.
The cost estimation for one selective broadcast direction
is shown below. In practice, we compute both directions and
pick the cheapest. The complexity is O(n), where n is the
number of nodes with at least one matching tuple for the
given join key. The total number of steps required is less
than the number of tuples. Thus, the theoretical complex-
ity is linear. The messages that carry location information,
logically seen as key and node pairs, have size equal to M.
Algorithm track join: broadcast R to S
R
all
←
P
i
|R
i
|
R
local
←
P
i
|R
i
|, where |S
i
| > 0
R
nodes
← |i, where |R
i
| > 0 ∧ i 6= sel f |
S
nodes
← |i, where |S
i
| > 0|
RS
cost
← R
all
· S
nodes
− R
local
+ R
nodes
· S
nodes
· M
return RS
cost