基于分布密度的聚类边界切割算法在数据包分类中的应用

62 浏览量更新于2024-08-26 收藏 345KB PDF 举报

基于分布密度的分组分类聚类边界切割本文讨论了一种基于分布密度的分组分类聚类边界切割算法，该算法旨在解决现有 trie 基于算法的时间消耗问题。该算法分为两个阶段：密度基于规则聚类过程和 trie 构建过程。密度基于规则聚类过程的主要思想是根据 packet 字段的前缀将规则表示为 0 到 1 之间的范围值。当某个范围内的规则数量达到一定的密度时，相应的规则将被形成一个聚类。这种方法可以有效地减少规则的数量，使得后续的 trie 构建过程更加高效。在 trie 构建过程中，我们使用聚类后的规则来构建 trie 结构。该结构可以快速地查找 packet 的分类结果，从而提高分类的速度和准确性。实验结果表明，该算法可以提高搜索时间 47.05%-73.76%，同时保持高的准确性（69.83%-93.17%）。这说明我们的算法可以在保持高准确性的同时提高分类的速度，是实际应用的良好选择。在 packet 分类领域中，传统的 trie 基于算法存在着时间消耗大的问题。我们的算法可以解决该问题，提高分类的速度和准确性。该算法的出现将对 packet 分类领域产生深远的影响。在实际应用中，该算法可以应用于各种网络设备和系统中，例如路由器、交换机、防火墙等，以提高网络的安全性和可靠性。此外，该算法还可以应用于数据挖掘和机器学习领域，用于分类和聚类大量的数据。本文提出的基于分布密度的分组分类聚类边界切割算法可以有效地解决 packet 分类中的时间消耗问题，提高分类的速度和准确性。该算法的出现将对 packet 分类领域产生深远的影响，具有广泛的应用前景。知识点： 1. 分布密度的概念：分布密度是指在某个范围内的数据点的密度大小。该概念在数据挖掘和机器学习领域中应用广泛。 2. 基于规则的聚类方法：基于规则的聚类方法是指根据规则之间的相似度将规则聚类到一起。该方法可以减少规则的数量，提高分类的速度和准确性。 3. trie 数据结构：trie 数据结构是一种树形结构，用于快速查找和分类数据。该结构广泛应用于 packet 分类和路由选择领域。 4. 分类算法的评估指标：分类算法的评估指标包括准确性、召回率、F1 值等。这些指标可以评估分类算法的性能和效果。 5. 数据挖掘和机器学习领域的应用：数据挖掘和机器学习领域广泛应用于分类、聚类、关联规则挖掘等领域。我们的算法可以应用于这些领域，提高分类的速度和准确性。 6. 网络设备和系统中的应用：我们的算法可以应用于各种网络设备和系统中，例如路由器、交换机、防火墙等，以提高网络的安全性和可靠性。 7. 分布密度的应用：分布密度的概念可以应用于数据挖掘、机器学习和网络安全领域，用于分类、聚类和关联规则挖掘等领域。

Clustering Boundary Cutting for Packet Classification based on Distribution

Density

Xia-an Bi*, Yanwen Zhou, Jianping Yu

College of Mathematics and Computer Science

Hunan Normal University

Changsha 410081, P.R. China

bixiaan@hnu.edu.cn

Abstract

—

In this paper, we present the clustering boundary

cutting trie algorithm in order to solve the problem of huge

time consumption in existing trie based algorithms. In the

proposed solution, there are two stages. The first stage is the

density-based rule clustering process. The rules are

represented as a range between 0 and 1 according to the

prefixes of the packet fields. When the number of the rules in a

range reaches to a certain density, the corresponding rules are

formed in a cluster. The second stage is the trie construction

process based on these clusters. Compared with traditional

packet classification algorithms, the searching time of our

algorithm increases by 47.05% -73.76% and keep the high

accuracy of 69.83%-93.17%. The experiment demonstrates

that our algorithm can effectively keep high accuracy as well as

keeping stable high-throughput, and it is suitable for actual

deployment.

Keywords

：

Packet Classification; Density-Based Clustering;

Trie

I. INTRODUCTION

In the field of computer communication, the packet

classification which is deployed in Internet routers is an

essential technology

[1]

. Packet classification has been widely

used in Internet services such as the quality of service,

security and differentiated services

[2]

and exhibits great

impact on these Internet services

[3]

. The role of the packet

classification is to compare multiple header fields of the

incoming packets with a series of predefined rules, and

return the identity which possesses the matching rule

[4]

. For a

certain rule, it includes the following parts: source IP

address, destination IP address, source port number,

destination port number, and protocol type

[5]

. Due to the

increase of the network traffic and size of classifiers, the

quality and the speed of packet classification significant

affect the network’s performance

[6]

In order to support the explosion of cloud services and

convergence of data centers based on virtualization

technologies, service and product providers are driving a

revolution in network, which is known as Software-Defined

Networking(SDN)

[7]

. The SDN introduces programmability

to the network with the opportunity to dynamically route

traffic based on flow descriptions. The most commonly

deployed Software Defined Networking (SDN) technology is

the OpenFlow. The OpenFlow has been developed to make

the network programmable and flexible through using

flow-based switching and centralized management

[8]

Flexible Traffic Engineering (F-TE) is proposed to achieve

F-TE in a network that consists of multiple IPv4- and

IPv6-islands with online and adaptive IP-forwarding

interchanging which is enabled by OpenFlow

[9]

Currently, the existing packet classification algorithms

are mostly based on the realization of the software. In the

software-based packet classification technology, there is an

important research stream of the algorithms based on the trie,

such as HiCuts

[10]

, HyperCuts

[11]

, HyperSplit

[6]

and

EffiCuts

[12]

. But most of them suffer from memory explosion

problems due to uncontrolled rule replications during the

process of trie construction. EffiCuts

[12]

was proposed to

address the memory explosion problem through two

techniques: Separable tries and Equi-dense cuts. HD-Cuts

[13]

first partitions the classifier into subsets with insights on

their characteristics, then builds tries for each subset by

exploiting the characteristics of each individual subset. By

this way, HD-Cuts is capable of improving storage and

searching performance simultaneously. However, these

algorithms could not effectively solve the problem of rule

replication especially in high-speed network, which

negatively impacts the memory and searching performance.

Many studies have been carried out to improve the

performance of the algorithms, but there has been no attempt

to combine the aggregate characteristic of rules with the trie

construction. This paper proposes an algorithm called

Clustering Boundary Cutting (CBC) trie algorithm with the

aim of avoiding generating any duplicated rule. Our solution

has two stages. The first stage is the density-based rule

clustering process. The rules are represented as a range

between 0 and 1 according to the prefixes of the packet

fields. When the number of the rules in a range reaches to a

certain density, the corresponding rules are formed in a

cluster. The second stage is the trie construction process. In

the trie which is constructed based on the above clusters,

each node does not include a duplicated rule. By combining

density-based clustering method and trie construction, this

paper makes the following contributions. In theory, we

propose the formalization of packet classification based on

geometric space. This method uses the mathematical model

to map data packets and rules into the rectangular area in

two-dimensional space. Then we use the theoretical analysis

to prove the mathematical model established by this method,

and it is proved that the packets and rules still keep the

original features and the mapping rectangular area also meets

the packet matching process. In terms of algorithm, this

paper designs a novel branch trie structure, which not only

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38649838

粉丝: 4

基于分布密度的聚类边界切割算法在数据包分类中的应用

基于密度聚类optics算法参数估计

聚类_聚类算法_

基于改进自适应k均值聚类的三维点云骨架提取的研究.docx

四种聚类算法

实验二 聚类算法,聚类分析的算法,matlab源码.zip

python-d_rearndf_python聚类_python-d_聚类_

智能电表聚类教程.zip

聚类数据集源码.zip

聚类算法可视化平台.zip

数据挖掘中的聚类算法概览

最新资源

实验二聚类算法,聚类分析的算法,matlab源码.zip