Establishing Claim 1 for
¯
T . One can inspect that the algorithm walk behaves the same as sim until a collision
occurs at Line
8 (that is, there is a collision in {a
x
1
, a
x
2
, . . . , a
x
j+1
}). That is, sim(s, ℓ) and walk(s, ℓ, 0)
behave the same until reaching a collision a
w
j
= a
w
k
for j 6= k. This implies that (
9) holds.
To show Claim
1 holds for
¯
w and
¯
T , we still have to argue that g
i
(ExtFind(next(µ), i − 1)) is inde-
pendent from the event F
~
k
∧ r
<i
∧ g
<i
. Formally proving this requires a delicate induction, but the intu-
ition is that F
~
k
depends on at most k
i
values in g
i
and r
i
, and the procedure walk carefully ensures that
g
i
(ExtFind(next(µ), i − 1)) is never one of them. Hence, since k
i
≤ τ /4 and g
i
is τ-wise independent, we
have the desired independence.
Handling E
bad
and the two-vertex case. We have just established Condition (
12) which gives a lower
bound for E
total
; now we briefly discuss how to obtain an upper bound on E
bad
sufficient for proving the
desired lower bound on Pr[u ∈ f
∗
a,h
(s)] using (14). One can first observe that (13) cannot hold for all
possible
~
k,
~
k
1
,
~
k
2
, as there could be a collision between these three paths. In fact, let K be the total number
of nodes in the union of the paths corresponding to
~
k,
~
k
1
,
~
k
2
. Then a revised estimate for Pr[F
~
k
∧next(µ
~
k
) =
u ∧F
~
k
1
∧F
~
k
2
∧a
next(µ
~
k
1
)
= a
next(µ
~
k
2
)
] should be
2
K
· n
2
−1
. B y a careful calculation, one can show that
this revised estimate is still enough to show E
bad
is upper bounded by O(2
3ℓ
/n
2
), which is good enough for
our purposes.
However, even establishing this revised estimate is quite challenging. Recall that F
~
k
∧ F
~
k
1
∧ F
~
k
2
is
equivalent to the condition that, for ever y level-i node β on the paths from root to µ
~
k
, µ
~
k
1
or µ
~
k
2
, it holds that
g
i
(a
w
β
) = 1. This amounts to K events and we hope to show they are all independent. However, this is not
true in general, as there can be a collision of a
w
β
between two different paths among these three paths. We
overcome this issue by showing that for each “bad node” µ
~
k
, there must exist a “bad” collision pair
~
k
1
and
~
k
2
on the extended walk without this issue. In such case one can establish a revised estimate; subtracting all
these revised estimates from E
good
would still yield a good lower bound on Pr[u ∈ f
∗
a,h
(s)].
Our proof for lower-bounding Pr[u, v ∈ f
∗
a,h
(s)] follows the same template above, while using a more
involved analysis to handle the dependency issues across the paths (we have to consider four paths now: two
corresponding to u and v, and the other two corresponding to the “bad” collision pair).
3 Preliminaries
Let [n] denote {1, 2, . . . , n}. We use
N
to denote the set of non-negative integers. We use
e
O(f) to denote
O(f ·poly log f ) in the usual way;
e
Ω,
e
Θ are defined similarly.
We measure the space complexity of an algorithm by the maximum number of bits in its working memory:
the read-only input is not counted. We measure the time complexity by the number of word operations (with
word length Θ(log n)) in the word RAM model.
For Element Distinctness and List Disjointness, we always assume the input arrays of length n consist
of positive integers bounded from above by m = n
c
+ c, where c is a fixed constant independent of n. (We
often abbrievate this by saying m = poly(n).) For an array a ∈ [m]
n
, define the second frequency moment
F
2
(a) =
P
n
i=1
P
n
j=1
1[a
i
= a
j
] as the number of colliding pairs (i, j) (including the case where i = j).
Note that n ≤ F
2
(a) ≤ n
2
.
We will use the following standard pseudorandomness construction.
Theorem 3.1 (Explicit k-wise independent hash family, [
CW79]; see also [Vad12, Corollary 3.34]). For
n, m, k, there is a family of k-wise independent functions H ⊆ {h | h: {0, 1}
n
→ {0, 1}
m
} such that every
15