Zhiyao Hu et al.: Comparing Set Reconciliation Methods Based on Bloom Filters and Their Variants 159
by an SBF is proportional to the set cardinality, not the
set difference.
2.2 IBF-based set reconciliation method
An IBF
[11]
employs an array of cells, SŒ1; : : : ; S Œm. A
cell consists of three fields, called idSum, hashSum, and
count, which denoted by SŒi:idSum, S Œi :hashSum,
and SŒi:count.
The idSum field records the XOR results of all
objects that are hashed into that cell.
The hashSum field stores the sum of hash values
of all objects.
The count field records the number of objects
mapped into the cell.
The IBF allows the following operations:
The operation Insert (x; y) adds a key-value to the
IBF, i.e., SŒh
i
.x/:idSum D SŒh
i
.x/:idSum ˚ x,
SŒh
i
.x/:hashSum D S Œh
i
.x/:hashSum ˚ y, and
SŒh
i
.x/:count D S Œh
i
.x/:count+1.
The operation Delete(x, y) removes a key-value
pair (x, y) from the IBF. It subtracts x and y from
the idSum and hashSum fields and decrements the
count field.
The operation Get(x) retrieves the value associated
with a key x from the IBF. If S Œh
i
.x/:count D 1,
then it returns SŒh
i
.x/:hashSum. Otherwise, it
returns “not found”.
The operation ListEntries() recovers all the objects
stored in the IBF by sequentially removing the
objects whose counter value equals one. It faces
the listing failure issue in this process. See Ref.
[15] for more details.
Each object in any set is mapped to at most k
cells in the IBF via a set of hash functions. It is
the idSum field in each cell that encodes all objects
in it via the XOR operation. After representing two
sets with the IBF and exchanging the two IBFs, the
proposed subtract operation between IBFs can eliminate
common elements of sets S
A
and S
B
. Furthermore,
the different objects can be recovered by using the
extraction operation iteratively with a given probability.
As shown in Fig. 1b, nodes A and B generate
IBF.S
A
/ and IBF.S
B
/. The two nodes then exchange
their local IBFs with each other, and subtract the
received remote IBF from each local IBF. Then the
two nodes try to recover the objects that are different
between the two related sets.
2.3 CBF-based set reconciliation method
Recall that in the SBF-based set reconciliation method,
the deletion of any object enables all cells in
SBFŒh
j
.s
i
/ for 1 6 j 6 k to be reset to 0. Other
objects in S that hash to one or more cells at
SBFŒh
j
.s
i
/ for 1 6 j 6 k will no longer be correctly
identified. To address this problem, Fan et al.
[16]
proposed the Counting Bloom Filter (CBF). A CBF
provides a way to implement a delete operation on an
SBF without regenerating the filter. In a CBF, each
item of its bit vector is extended from being a single
bit to being an m l-bit counter, and l D 4 usually
suffices
[17, 18]
.
The CBF-based set reconciliation method requires
each node to represent its objects using a CBF, which
is exchanged with the other node. A receiving node
subtracts the received CBF from its local CBF so as to
query the objects not common to the two sets. However,
in this paper, we focus on two mainstream methods
for set reconciliation, i.e., SBF-based and IBF-based
methods, to understand the set reconciliation problem.
In this paper, a single capital letter like S denotes a
set with cardinality n. Other notations and symbols are
summarized in Table 1.
3 Modeling and Analysis
A metric is derived to evaluate the communication
overhead of the SBF-based and the IBF-based set
reconciliation methods. For the SBF-based method, due
to false positives, some objects that only appear in one
set may not be identified. The IBF-based method avoids
Table 1 Symbols and notations.
Term
Definition
n
Set cardinality
d
Total size of the set differences
u
Universe set
c
k
C
Coefficient of d in IBF
f
False positive probability of an SBF
Ratio of d to n
M
Storage space size of an object in the set
R
Number of cells in each IBF partition
L
Number of strata tried in difference estimation
C.ES/
Communication overhead of exchanging SBFs
C.ED/
Communication overhead of exchanging
unique objects from another node for SBF
C.EP/
Communication overhead of
estimating the number of different objects for IBF
C.EI/
Communication overhead of exchanging IBFs
between the two nodes
C.SBF/
Communication overhead of the SBF-based method
C.IBF/
Communication overhead of the IBF-based method