高效实现的常规与近似通用哈希

6 浏览量更新于2024-07-14 收藏 578KB PDF 举报

"Regular and Almost Universal Hashing - An Efficient Implementation - 2016 (1609.09840) - 计算机科学" 本文主要探讨了在计算机科学中，尤其是数据结构领域，如哈希表的高效实现策略。随机哈希是提供性能保证的有效方法，即使在对抗性环境中也能保持稳定。"Regular and Almost Universal Hashing" 是一种旨在优化哈希函数性能的理论与实践相结合的技术。 "Universal Hashing" 是一种常见的概念，它保证了在给定两个不同的数据对象时，它们被映射到相同哈希值的概率较低。这意味着，如果我们随机选择哈希函数，对于任意两个数据对象，它们有很小的可能性得到相同的哈希码。这种性质对于避免哈希冲突至关重要，冲突会降低哈希表的查找效率。然而，通用哈希函数并不能确保所有函数都表现良好。为了进一步提高性能，文章引入了“Regular Hashing”的概念。在随机选取数据对象的情况下，常规哈希要求对于任何固定的哈希函数，数据对象有较低的概率得到相同的哈希值。这样可以更均匀地分布哈希值，从而减少冲突。作者D. Ivanchykhin、S. Ignatchenko和D. Lemire提出了一种名为“PM+”的非加密哈希函数家族，该家族同时具备良好的运行时间、内存使用效率以及理论上区分的保证：几乎普适性和分组件规则性。"Almost Universality" 指的是即使在最坏情况下，哈希函数仍然能保持较低的碰撞概率；而"Component-wise Regularity" 则意味着在各个组件或维度上，哈希函数的行为都是均匀的。在多种平台上的测试显示，"PM+" 实现的性能可与当前最先进的哈希函数相媲美。特别是在最新的Intel处理器上，PM+的表现尤为出色，这表明其在现代计算环境中的高效性。这篇论文为数据结构和算法领域提供了新的思考方向，即如何设计并实现既能保证理论性能又能实现实用效率的哈希函数。通过引入“Regular”和“Almost Universal”特性，研究人员能够构建出在实际应用中更加可靠的哈希数据结构，这对于大数据处理、数据库索引、机器学习等多个领域都有重要的应用价值。

REGULAR AND ALMOST UNIVERSAL HASHING 5

3. REGULARITY

Though we can require families of hash functions to have desirable properties such as uniformity

or universality, we also want individual hash functions to have reasonably good properties. For

example, what if a family contains the hash function h(x) = c for some constant c? This particular

hash function is certainly not desirable! In fact, it is the worst possible hash function for a hash table.

Yet we can ﬁnd many such hash functions in a family that is otherwise strongly universal. Indeed,

Dietzfelbinger [9] proposed a strongly universal family made of the hash functions

A,B

(x) =



Ax + B mod 2



÷ 2

n−1

with integers A, B ∈ [0, 2

). It is strongly universal over the domain of integers x ∈ [0, 2

However, one out of 2

hash functions has A = 0. That is, if you pick a hash function at random,

the probability that you have a constant function (h

0,B

(x) = B ÷ 2

L−1

) is 1/2

. Though this

probability might be vanishingly small, many of the other hash functions have also poor distributions

of hash values. For example, if one picks A = 2

K−1

, then any two hash values (h

A,B

(x) and

A,B

)) may only differ by one bit, at most. Letting A be odd also does not solve the problem:

e.g., A = 1, B = 0 gives the hash function x ÷ 2

n−1

which is either 1 or 0.

Such weak hash functions are a security risk [5, 6]. Thus, we require as much as possible that

hash functions be regular [24, 25].

Deﬁnition 1

A hash function h : X → Y is regular if for every y ∈ Y , we have that |{x ∈ X : (h(x) = y)}| ≤

d|X|/|Y |e. Further, a family H of hash functions is regular if every h ∈ H is regular.

We stress that this regularity property applies to individual hash functions.

‡

However, we can still

give a probabilistic interpretation to regularity: if we pick any two values x

and x

at random, the

probability that they collide h(x

) = h(x

) should be minimal (|Y |/|X|) if h is regular.

As an example, consider the case where X = Y = {0, 1}. There are only two regular hash

functions h : X → Y . The ﬁrst one is the identity function (h

(0) = 0, h

(1) = 1) and the second

one is the negation function (h

(0) = 1, h

(1) = 0). The family {h

, h

} is uniform and universal:

the collision probability between distinct values is zero.

More generally, whenever X = Y , a function h : X → Y is regular if and only if it is a

permutation. This observation sufﬁces to show that it is not possible to have strong universality and

regularity in general. Indeed, suppose that X = Y , then all hash functions h must be permutations.

Meanwhile, strong universality means that given that we know the hash value y of the element x (i.e.,

h(x) = y), we still known nothing about the hash value of x

for x

6= x. But if h is a permutation, we

know that the hash values differ (h(x

) 6= h(x))—contradicting strong universality. More formally,

if h is a permutation, we have that h(x) 6= h(x

) for x 6= x

which implies that P (h(x) = h(x

)) = 0

whereas P (h(x) = h(x

)) = 1/|Y | is required by strong universality. Thus, while we can have both

universality and regularity, we cannot have both strong universality and regularity.

The next two lemmas state that regularity is preserved under composition and concatenation.

Lemma 4

(Composition) Assume that |Y | divides |X| and |Z| divides |X|. Let f : X → Y and g : Y → Z be

regular hash functions then f ◦ g : X → Z is also regular.

Lemma 5

(Concatenation) Let f : X

→ Y

and g : X

→ Y

be regular hash functions then the function

h : X

× X

→ Y

× Y

deﬁned by h(x

, x

) = (f(x

), g(x

)) is also regular.

‡

In contrast, Fleischmann et al. [26] used the term -almost regular to indicate that a family is almost uniform:

P (h(x) = y) ≤  for all x and y given that h is picked in H.

剩余24页未读，继续阅读

weixin_38745233

粉丝: 10
资源: 906

高效实现的常规与近似通用哈希

09-散列3. Hashing - Hard Version (30).zip

尽可能多的推荐关于多特征服装检索的文献

NIST.FIPS.202

11-散列4 hashing - hard version

java hashMap

java.security.MessageDigest

图像检索哈希算法的发展史，标注对应的年限

最新资源