”Better Than Nothing” Privacy with Bloom Filters: To What Extent? 351
Several scenarios employ plain Bloom filters to exchange private information.In
several cases, privacy is not quantified, but is generically mandated to the Bloom fil-
ter one-way hashing with limited further argumentation [18–20] (although in this latter
work a large universe set is mentioned). In other cases, such as [21], privacy is accom-
plished by not admitting repeated queries (i.e., enumeration) and by setting the Bloom
filter parameters so as to achieve enough false positives. Still with reference to plain
Bloom filters, the work [22], dealing with the sharing of payload information across
Distributed Intrusion Detection systems, is among the few which somewhat quantify
privacy, although this is done indirectly, through the quantification of the large universe
set involved in the specifically considered application, and the large number of possible
n-grams per filter bit.
Some works use Bloom filters but introduce supplementary non cryptographic
procedures to improve their privacy level. This is the case of [23], which proposes
to index documents from multiple providers organized into disjoint privacy groups,
and devises an iterative procedure which uses randomized Bloom filters to produce a
privacy preserving index. Another approach, proposed in [24], consists in splitting a
Bloom filter into segments distributed across multiple participants; it is suggested that
the higher false probability in a segment improves privacy.
An attack devised to revert privacy has been described in [25]. This work analyzes
an earlier proposed system [26] based on Bloom filters for string comparison in pri-
vate record linkage, and shows how to extract significant amount of private information
through a Constraint Satisfaction Cryptanalysis. Note that in some scenarios, the ability
to invert a Bloom filter comes as a functional advantage: indeed [27] proposes a Bloom
filter enhancement where extra information permits to list the complete filter’s content
with high probability.
Concerning privacy metrics specifically devised for Bloom filters we are not aware
of any prior specific work. Of course, a huge amount of papers have tackled privacy and
anonymity definitions in the much more general database setting, such as K-anonymity
[16] and probabilistic K-anonymity [17], L-diversity [28], T-closeness [29], Differential
Privacy [30], etc. In this work we cast K-anonymity, in its probabilistic version, to
the Bloom filter setting (with special attention to the practically more interesting case
K =2which we conveniently define as deniability), and we provide explicit formulae
to measure it.
2 Preliminaries
This section reviews background information on Bloom filters, formally introduces the
notion of hiding set, and derives a combinatorial balls and bins result used in the re-
mainder of the work. For the reader’s convenience, Table 1 summarizes notation.
2.1 Bloom Filters
A Bloom filter [1, 6] is a probabilistic data structure used to represent set membership.
A Bloom filter is implemented as an array B[i],i ∈ (1,m),ofm bits accessed via k
independent hash functions H
1
(x)...H
k
(x), each of which maps a string x ∈{0, 1}
∗
to one of the m bits within the bit array.