Faster 64-bit universal hashing using carry-less multiplications 3
Proof For any integer constant c ∈ [0, 2
L
), consider the
equation h(x) = (h(x
0
) ⊕ c) mod 2
L
0
for x 6= x
0
with h
picked from H. Pick any positive integer L
0
< L. We have
P (h(x) = (h(x
0
) ⊕ c mod 2
L
0
))
=
X
z | z mod 2
L
0
=0
P (h(x) = h(x
0
) ⊕ c ⊕ z)
where the sum is over 2
L−L
0
distinct z values. Because H
is -almost XOR-universal, we have that P(h(x) = h(x
0
) ⊕
c⊕z) ≤ for any c and any z. Thus, we have that P (h(x) =
h(x
0
) ⊕ c mod 2
L
0
) ≤ 2
L−L
0
, showing the result.
It follows from Lemma 1 that if a family is XOR-universal,
then its modular reductions are XOR-universal as well.
As a straightforward extension of this lemma, we could
show that when picking any L
0
bits (not only the least sig-
nificant), the result is 2
L−L
0
× -almost XOR-universal.
2.2 Composition
It can be useful to combine different hash families to create
new ones. For example, it is common to compose hash fam-
ilies. When composing hash functions (h = g ◦ f), the uni-
versality degrades linearly: if g is picked from an
g
-almost
universal family and f is picked (independently) from an
f
-almost universal family, the result is
g
+
f
-almost uni-
versal [36].
We sketch the proof. For x 6= x
0
, we have that g(f(x)) =
g(f(x
0
)) collides if f (x) = f
0
(x). This occurs with proba-
bility at most
f
since f is picked from an
f
-almost uni-
versal family. If not, they collide if g(y) = g(y
0
) where
y = f (x) and y
0
= f (x
0
), with probability bounded by
g
. Thus, we have bounded the collision probability by
f
+
(1 −
f
)
g
≤
f
+
g
, establishing the result.
By extension, we can show that if g is picked from an
g
-almost XOR-universal family, then the composed result
(h = g ◦ f) is going to be
g
+
f
-almost XOR-universal. It
is not required for f to be almost XOR-universal.
2.3 Hashing Tuples
If we have universal hash functions from X to [0, 2
L
), then
we can construct hash functions from X
m
to [0, 2
L
)
m
while
preserving universality. The construction is straightforward:
h
0
(x
1
, x
2
, . . . , x
m
) = (h(x
1
), h(x
2
), . . . , h(x
m
)). If h is
picked from an -almost universal family, then the result is
-almost universal. This is true even though a single h is
picked and reused m times.
Lemma 2 Consider an -almost universal family H from
X to [0, 2
L
). Then consider the family of functions H
0
of
the form h
0
(x
1
, x
2
, . . . , x
m
) = (h(x
1
), h(x
2
), . . . , h(x
m
))
from X
m
to [0, 2
L
)
m
, where h is in H. Family H
0
is -almost
universal.
The proof is not difficult. Consider two distinct values from
X
m
, x
1
, x
2
, . . . , x
m
and x
0
1
, x
0
2
, . . . , x
0
m
. Because the tuples
are distinct, they must differ in at least one component: x
i
6=
x
0
i
. It follows that h
0
(x
1
, x
2
, . . . , x
m
) and h
0
(x
0
1
, x
0
2
, . . . , x
0
m
)
collide with probability at most P (h(x
i
) = h(x
0
i
)) ≤ ,
showing the result.
2.4 Variable-Length Hashing From Fixed-Length Hashing
Suppose that we are given a family H of hash functions that
is XOR universal over fixed-length strings. That is, we have
that P (h(s) = h(s
0
) ⊕ c) ≤ 1/2
L
if the length of s is the
same as the length of s
0
(|s| = |s
0
|). We can create a new
family that is XOR universal over variable-length strings
by introducing a hash family on string lengths. Let G be a
family of XOR universal hash functions g over length val-
ues. Consider the new family of hash functions of the form
h(s) ⊕ g(|s|) where h ∈ H and g ∈ G. Let us consider two
distinct strings s and s
0
. There are two cases to consider.
– If s and s
0
have the same length so that g(|s|) = g(|s
0
|)
then we have XOR universality since
P (h(s) ⊕ g(|s|) = h(s
0
) ⊕ g(|s
0
|) ⊕ c)
= P (h(s) = h(s
0
) ⊕ c)
≤ 1/2
L
where the last inequality follows because h ∈ H, an
XOR universal family over fixed-length strings.
– If the strings have different lengths (|s| 6= |s
0
|), then we
again have XOR universality because
P (h(s) ⊕ g(|s|) = h(s
0
) ⊕ g(|s
0
|) ⊕ c)
= P (g(|s|) = g(|s
0
|) ⊕ (c ⊕ h(s) ⊕ h(s
0
)))
= P (g(|s|) = g(|s
0
|) ⊕ c
0
)
≤ 1/2
L
where we set c
0
= c ⊕ h(s) ⊕ h(s
0
), a value independent
from |s| and |s
0
|. The last inequality follows because g
is taken from a family G that is XOR universal.
Thus the result (h(s)⊕g(|s|)) is XOR universal. We can also
generalize the analysis. Indeed, if H and G are -almost uni-
versal, we could show that the result is -almost universal.
We have the following lemma.
Lemma 3 Let H be an XOR universal family of hash func-
tions over fixed-length strings. Let G be an XOR universal
family of hash functions over integer values. We have that
the family of hash functions of the form s → h(s) ⊕ g(|s|)
where h ∈ H and g ∈ G is XOR universal over all strings.
Moreover, if H and G are merely -almost universal, then
the family of hash functions of the form s → h(s) ⊕ g(|s|)
is also -almost universal.