利用CLMUL加速64位通用哈希：CLHASH vs VHASH与CityHash性能对比

48 浏览量更新于2024-07-14 收藏 478KB PDF 举报

本文档深入探讨了在现代64位处理器架构上利用Intel和AMD支持的 Carry-less Multiplication (CLMUL) 指令集实现更快的几乎通用的64位哈希函数，即CLHASH。CLMUL是一种特殊的快速乘法运算，它在没有进位的情况下执行，对于计算密集型的哈希操作具有显著的优势。作者Daniel Lemire和Owen Kaser在文章中首先介绍了背景，指出传统的64位哈希函数在处理大量数据时可能会面临性能瓶颈。他们将CLHASH与可能是当前x64平台上最快的几乎通用哈希函数VHASH进行了对比，结果显示CLHASH的速度至少提高了60%。这表明了CLMUL技术在提升哈希算法效率方面的巨大潜力。此外，作者还对CLHASH与Google的CityHash进行了性能评估。CityHash由于其设计初衷是追求速度，常被用于数据处理和加密应用中。结果表明，当输入数据超过64字节时，CLHASH比CityHash快40%，而在小数据规模下两者性能相当。这一发现突出了CLHASH在大数据场景下的优势，特别是在需要高吞吐量的环境中。本文的关键概念包括：几乎通用的64位哈希（提供广泛的数据对象映射到固定大小值的能力）， Carry-less Multiplication 的高效运算，以及在有限域算术中的应用。这些技术在信息安全、数据存储和查询优化等领域具有重要的实际意义，因为它们直接影响到数据处理的效率和系统的整体性能。这篇论文不仅提供了CLHASH作为一种新型高效的哈希函数的实现方法，还通过实际性能比较展示了其在现代计算机硬件上的优越性。这对于那些依赖于高性能哈希操作的开发者和研究者来说，是一份极具价值的技术参考资料。

Faster 64-bit universal hashing using carry-less multiplications 3

Proof For any integer constant c ∈ [0, 2

), consider the

equation h(x) = (h(x

) ⊕ c) mod 2

for x 6= x

with h

picked from H. Pick any positive integer L

< L. We have

P (h(x) = (h(x

) ⊕ c mod 2

))

z | z mod 2

P (h(x) = h(x

) ⊕ c ⊕ z)

where the sum is over 2

L−L

distinct z values. Because H

is -almost XOR-universal, we have that P(h(x) = h(x

) ⊕

c⊕z) ≤  for any c and any z. Thus, we have that P (h(x) =

h(x

) ⊕ c mod 2

) ≤ 2

L−L

, showing the result.

It follows from Lemma 1 that if a family is XOR-universal,

then its modular reductions are XOR-universal as well.

As a straightforward extension of this lemma, we could

show that when picking any L

bits (not only the least sig-

niﬁcant), the result is 2

L−L

× -almost XOR-universal.

2.2 Composition

It can be useful to combine different hash families to create

new ones. For example, it is common to compose hash fam-

ilies. When composing hash functions (h = g ◦ f), the uni-

versality degrades linearly: if g is picked from an 

-almost

universal family and f is picked (independently) from an



-almost universal family, the result is 

+ 

-almost uni-

versal [36].

We sketch the proof. For x 6= x

, we have that g(f(x)) =

g(f(x

)) collides if f (x) = f

(x). This occurs with proba-

bility at most 

since f is picked from an 

-almost uni-

versal family. If not, they collide if g(y) = g(y

) where

y = f (x) and y

= f (x

), with probability bounded by



. Thus, we have bounded the collision probability by 

(1 − 

)

≤ 

+ 

, establishing the result.

By extension, we can show that if g is picked from an



-almost XOR-universal family, then the composed result

(h = g ◦ f) is going to be 

+ 

-almost XOR-universal. It

is not required for f to be almost XOR-universal.

2.3 Hashing Tuples

If we have universal hash functions from X to [0, 2

), then

we can construct hash functions from X

to [0, 2

)

while

preserving universality. The construction is straightforward:

, x

, . . . , x

) = (h(x

), h(x

), . . . , h(x

)). If h is

picked from an -almost universal family, then the result is

-almost universal. This is true even though a single h is

picked and reused m times.

Lemma 2 Consider an -almost universal family H from

X to [0, 2

). Then consider the family of functions H

the form h

, x

, . . . , x

) = (h(x

), h(x

), . . . , h(x

))

from X

to [0, 2

)

, where h is in H. Family H

is -almost

universal.

The proof is not difﬁcult. Consider two distinct values from

, x

, . . . , x

and x

, x

, . . . , x

. Because the tuples

are distinct, they must differ in at least one component: x

. It follows that h

, x

, . . . , x

) and h

, x

, . . . , x

)

collide with probability at most P (h(x

) = h(x

)) ≤ ,

showing the result.

2.4 Variable-Length Hashing From Fixed-Length Hashing

Suppose that we are given a family H of hash functions that

is XOR universal over ﬁxed-length strings. That is, we have

that P (h(s) = h(s

) ⊕ c) ≤ 1/2

if the length of s is the

same as the length of s

(|s| = |s

|). We can create a new

family that is XOR universal over variable-length strings

by introducing a hash family on string lengths. Let G be a

family of XOR universal hash functions g over length val-

ues. Consider the new family of hash functions of the form

h(s) ⊕ g(|s|) where h ∈ H and g ∈ G. Let us consider two

distinct strings s and s

. There are two cases to consider.

– If s and s

have the same length so that g(|s|) = g(|s

then we have XOR universality since

P (h(s) ⊕ g(|s|) = h(s

) ⊕ g(|s

|) ⊕ c)

= P (h(s) = h(s

) ⊕ c)

≤ 1/2

where the last inequality follows because h ∈ H, an

XOR universal family over ﬁxed-length strings.

– If the strings have different lengths (|s| 6= |s

|), then we

again have XOR universality because

P (h(s) ⊕ g(|s|) = h(s

) ⊕ g(|s

|) ⊕ c)

= P (g(|s|) = g(|s

|) ⊕ (c ⊕ h(s) ⊕ h(s

)))

= P (g(|s|) = g(|s

|) ⊕ c

)

≤ 1/2

where we set c

= c ⊕ h(s) ⊕ h(s

), a value independent

from |s| and |s

|. The last inequality follows because g

is taken from a family G that is XOR universal.

Thus the result (h(s)⊕g(|s|)) is XOR universal. We can also

generalize the analysis. Indeed, if H and G are -almost uni-

versal, we could show that the result is -almost universal.

We have the following lemma.

Lemma 3 Let H be an XOR universal family of hash func-

tions over ﬁxed-length strings. Let G be an XOR universal

family of hash functions over integer values. We have that

the family of hash functions of the form s → h(s) ⊕ g(|s|)

where h ∈ H and g ∈ G is XOR universal over all strings.

Moreover, if H and G are merely -almost universal, then

the family of hash functions of the form s → h(s) ⊕ g(|s|)

is also -almost universal.

剩余14页未读，继续阅读

weixin_38634323

粉丝: 7
资源: 899

利用CLMUL加速64位通用哈希：CLHASH vs VHASH与CityHash性能对比

Faster R-CNN backbone - ResNet101.zip

py-faster-rcnn-windows-master.zip

Faster R-CNN

faster r-cnn cannot import mask

Faster R-CNN的缺点

Faster R-CNN训练参数

faster r-cnn如何使用

Faster R-CNN如何使用

最新资源