Reducing the Number of Bloom Filters
Qingge Gong
1
, Tong Yang
2
, Hongwei Tong
3
, Kai Shi
3
, Jinghui Li
4
, Xianyan Wu
1
1. Engineering University of CAPF, Xi’an, China. 2. Information and Navigation College, Air Force Engineering University,
Xi’an, China. 3. Military representative office of certain company, Chengdu, China. 4. PLA military 95880, China.
Email: {gongqingge@sina.com, yangtongemail@gmail.com, tongweihong69@163.com, prideshikai@gmail.com,
lijinghui1020@sina.com, wuxinyan518@163.com}
Abstract
—Bloom Filters have been applied in many fields, in-
cluding data base, network management, computer network, and
computer communication etc., owing to its fast membership que-
ry and memory efficiency. In network field, Bloom Filters were
used to lookup the forwarding tables in backbone routers and
Data Centre switches. These solutions all build many Bloom Fil-
ters to achieve fast lookup. The main shortcoming of these solu-
tion is: too many Bloom Filters require too many hash computa-
tions and memory accesses, and the variety of Bloom Filter sizes
poses challenges to system design and probably degrades the
system performance. To address this issues, we proposed Set
absorption algorithm to reduce the number of Bloom Filters,
while balancing the size of Bloom Filters. Experimental results
showed that after using our algorithm, better performance of the
Bloom Filter-based solutions was obtained.
Keywords—Bloom Filter; set absorption; LPM; IP lookup
I. INTRODUCTION
Bloom Filter was first proposed in 1970 [3], but applied to
computer network field 33 years later [2], then a lot of applica-
tions of Bloom Filter came to the fore. One important applica-
tion is to accelerate lookup, including Longest Prefix Matching
(LPM) and exact matching.
Routing lookup belonging to LMP is a classical issue, vari-
ous solutions have been proposed. Generally speaking, the so-
lutions can be divided into two categories: software-based solu-
tions and hardware-based solutions. Most software-based solu-
tions [1] [17] build a trie
to find the longest matched prefix,
but need multiple memory accesses for one lookup, thus the
lookup speed is relatively slow. Hardware-based solutions use
TCAM [19] [20] or GPU [21] to perform parallel lookup, thus
can achieve high speed, but suffer from high power consump-
tion and high hardware cost.
LMP is usually simplified into Exact Matching [2], and
Bloom Filter is also applied to exact matching. In enterprise
and data center networks, the forwarding tables of switches
could have tens or hundreds of thousands of end-host MAC
addresses. The performance of the Data Centre is determined to
a great extent by the lookup speed of switches. Existing solu-
tions adopts hash function to store the MAC address and its
outgoing link
(next-hop) to a large table. However, 1) large
hash table can hardly be held in fast memory; 2) the worst case
of hash collision cannot be bounded except that the forwarding
table is static
. Unfortunately, the forwarding tables (MAC
This work is supported by NSFC (61202489, 61272486) and CAPF certain
database project.
Trie is a tree-like data structure allowing the organization of prefixes on a
digital basis by using the bits of prefixes to direct the branching [1].
In this paper, outgoing link, port and the next-hop are exchangeable.
If the forwarding table is static, Perfect Hash Function can achieve no hash
collision [22].
tables) changes frequently due to the immigration of virtual
machines [18] and network fault.
To perform fast MAC lookup, Minlan Yu et al. [24] pro-
posed a Bloom Filter-based solution: BUFFALO. Dan Li et al.
[11] proposed multi-class Bloom Filter to improve BUFFALO
in multicast situation. The basic approach of these two algo-
rithms is same: splitting the forwarding table into many small
sets according to the port (outgoing link), then building one
Bloom Filter for each set. When false positive doesn't occur,
only one Bloom Filter will report true for one lookup, then the
next-hop is figured out. The membership query of one Bloom
Filter is fast, but the query time grows linearly with the in-
crease of the number of Bloom Filters. Specifically, suppose k
is the number of hash functions of one Bloom Filter, then one
Bloom Filter query needs k hash computations and k memory
accesses. Both hash functions and memory accesses will be-
come rk for r Bloom Filters. Therefore, the lookup speed will
go down quickly along with the increase of the port number.
This problem occurs in the situation with many Bloom Filters
(including the above three solutions). Unfortunately, one
switch could have a large number of next-hops, thus there will
be many Bloom Filters with different sizes.
In addition, to make the membership query as fast as possi-
ble, the Bloom Filters should be stored in fast memory, such as
SRAM (BUFFALO) and on-chip memory (PBF). When
Bloom Filter is stored in on-chip memory of FPGA [15], fixed
memory size must be assigned for each Bloom Filter. Unfortu-
nately, the size of Bloom Filters varies a lot, thus each Bloom
Filter will be assigned the maximum size in the implementation
for scalability, which is a waste of on-chip memory. To achieve
memory efficiency, the size of Bloom Filters should be bal-
anced. To address these issues, we propose Set absorption al-
gorithm to reduce the number of Bloom Filters, while balanc-
ing the sizes of Bloom Filters. The main idea of is that one
small set will be absorbed by two larger sets. The details are
illustrated in Section IV.
The rest of the paper is organized as follows. Section II in-
troduces the background knowledge, including IP lookup, for-
warding table, and the principle of Bloom Filters. Section III
details the Set absorption algorithm. Performance evaluation is
provided in Section IV, and finally we conclude our paper in
Section V.
II. BACKGROUND
A. Longest Prefix Matching
Bloom Filter which was proposed in [3], has been applied
to many fields. In this paper, we focus on its application in the
lookup of forwarding tables of routers and switches. Note that
the lookup of the forwarding tables of routers (we usually say
'routing lookup' or 'IP lookup') must satisfies Longest Prefix
Matching (LMP) rule, while that of switches only performs