data:image/s3,"s3://crabby-images/e0ce7/e0ce70adbb6779a5d8e62acc2f8fbd1c0d3cb16b" alt=""
Clustering Boundary Cutting for Packet Classification based on Distribution
Density
Xia-an Bi*, Yanwen Zhou, Jianping Yu
College of Mathematics and Computer Science
Hunan Normal University
Changsha 410081, P.R. China
bixiaan@hnu.edu.cn
Abstract
—
In this paper, we present the clustering boundary
cutting trie algorithm in order to solve the problem of huge
time consumption in existing trie based algorithms. In the
proposed solution, there are two stages. The first stage is the
density-based rule clustering process. The rules are
represented as a range between 0 and 1 according to the
prefixes of the packet fields. When the number of the rules in a
range reaches to a certain density, the corresponding rules are
formed in a cluster. The second stage is the trie construction
process based on these clusters. Compared with traditional
packet classification algorithms, the searching time of our
algorithm increases by 47.05% -73.76% and keep the high
accuracy of 69.83%-93.17%. The experiment demonstrates
that our algorithm can effectively keep high accuracy as well as
keeping stable high-throughput, and it is suitable for actual
deployment.
Keywords
:
Packet Classification; Density-Based Clustering;
Trie
I. INTRODUCTION
In the field of computer communication, the packet
classification which is deployed in Internet routers is an
essential technology
[1]
. Packet classification has been widely
used in Internet services such as the quality of service,
security and differentiated services
[2]
and exhibits great
impact on these Internet services
[3]
. The role of the packet
classification is to compare multiple header fields of the
incoming packets with a series of predefined rules, and
return the identity which possesses the matching rule
[4]
. For a
certain rule, it includes the following parts: source IP
address, destination IP address, source port number,
destination port number, and protocol type
[5]
. Due to the
increase of the network traffic and size of classifiers, the
quality and the speed of packet classification significant
affect the network’s performance
[6]
.
In order to support the explosion of cloud services and
convergence of data centers based on virtualization
technologies, service and product providers are driving a
revolution in network, which is known as Software-Defined
Networking(SDN)
[7]
. The SDN introduces programmability
to the network with the opportunity to dynamically route
traffic based on flow descriptions. The most commonly
deployed Software Defined Networking (SDN) technology is
the OpenFlow. The OpenFlow has been developed to make
the network programmable and flexible through using
flow-based switching and centralized management
[8]
.
Flexible Traffic Engineering (F-TE) is proposed to achieve
F-TE in a network that consists of multiple IPv4- and
IPv6-islands with online and adaptive IP-forwarding
interchanging which is enabled by OpenFlow
[9]
.
Currently, the existing packet classification algorithms
are mostly based on the realization of the software. In the
software-based packet classification technology, there is an
important research stream of the algorithms based on the trie,
such as HiCuts
[10]
, HyperCuts
[11]
, HyperSplit
[6]
and
EffiCuts
[12]
. But most of them suffer from memory explosion
problems due to uncontrolled rule replications during the
process of trie construction. EffiCuts
[12]
was proposed to
address the memory explosion problem through two
techniques: Separable tries and Equi-dense cuts. HD-Cuts
[13]
first partitions the classifier into subsets with insights on
their characteristics, then builds tries for each subset by
exploiting the characteristics of each individual subset. By
this way, HD-Cuts is capable of improving storage and
searching performance simultaneously. However, these
algorithms could not effectively solve the problem of rule
replication especially in high-speed network, which
negatively impacts the memory and searching performance.
Many studies have been carried out to improve the
performance of the algorithms, but there has been no attempt
to combine the aggregate characteristic of rules with the trie
construction. This paper proposes an algorithm called
Clustering Boundary Cutting (CBC) trie algorithm with the
aim of avoiding generating any duplicated rule. Our solution
has two stages. The first stage is the density-based rule
clustering process. The rules are represented as a range
between 0 and 1 according to the prefixes of the packet
fields. When the number of the rules in a range reaches to a
certain density, the corresponding rules are formed in a
cluster. The second stage is the trie construction process. In
the trie which is constructed based on the above clusters,
each node does not include a duplicated rule. By combining
density-based clustering method and trie construction, this
paper makes the following contributions. In theory, we
propose the formalization of packet classification based on
geometric space. This method uses the mathematical model
to map data packets and rules into the rectangular area in
two-dimensional space. Then we use the theoretical analysis
to prove the mathematical model established by this method,
and it is proved that the packets and rules still keep the
original features and the mapping rectangular area also meets
the packet matching process. In terms of algorithm, this
paper designs a novel branch trie structure, which not only