Short version appeared at WADS 1999, LNCS 1663, p. 49–54. Springer Verlag.
Available on-line at http://www.brics.dk/~pagh/papers/
Hash and Displace:
Efficient Evaluation of Minimal Perfect Hash Functions
Rasmus Pagh
1
BRICS
2
, Department of Computer Science, University of Aarhus,
8000 Aarhus C, Denmark
Email: pagh@brics.dk
Abstract
A new way of constructing (minimal) perfect hash functions is described. The technique con-
siderably reduces the overhead associated with resolving buckets in two-level hashing schemes.
Evaluating a hash function requires just one multiplication and a few additions apart from
primitive bit operations. The number of accesses to memory is two, one of which is to a fixed
location. This improves the probe performance of previous minimal perfect hashing schemes,
and is shown to be optimal. The hash function description (“program”) for a set of size n
occupies O(n) words, and can be constructed in expected O(n) time.
1 Introduction
This paper deals with classes of hash functions which are perfect for the n-subsets of the finite
universe U = {0, . . . , u − 1}. For any S ∈
U
n
– the subsets of U of size n – a perfect class contains
a function which is 1-1 on S (“perfect” for S). We consider perfect classes with range {0, . . . , a−1}.
A perfect class of hash functions can be used to construct static dictionaries (data structures
storing sets S ∈
U
n
and supporting membership queries, “u ∈ S?”): Store a function h which
is 1-1 on the set, S, and for each element s ∈ S, store s in entry h(s) of an a-element table. A
membership query on s is processed by comparing s to entry h(s) in the table. The attractiveness
of using perfect hash functions for this purpose depends on several characteristics of the class.
1. The efficiency of evaluation, in terms of computation and the number of probes into the
description.
2. How hard is it to find a perfect function in the class?
3. How close to n can we choose a, the range of the functions?
4. How much space is required to store a function?
It turns out that, for suitable perfect classes of hash functions, the answers to all of these
questions are satisfactory in the sense that it is possible to do (more or less) as well as one could
hope for. Nevertheless, theoretically optimal schemes have seen limited use in practice, being
substituted by heuristics which typically work well. It is thus still of interest to find classes with
good properties from a theoretical as well as from a practical point of view.
1
Supported in part by the ESPRIT Long Term Research Programme of the EU under project number 20244
(ALCOM-IT)
2
Basic Research in Computer Science,
Centre of the Danish National Research Foundation.
1