2014 年 10 月 Journal on Communications October 2014
第 35 卷第 Z1 期 通 信 学 报 Vol.35
No. Z1
基于统计的高效决策树分组分类算法
陈立南
1,2
,刘阳
3
,马严
1,4
,黄小红
1
,赵庆聪
2
,魏伟
1
(1. 北京邮电大学 信息网络中心,北京 100876;2. 北京信息科技大学 信息管理学院,北京 100192;
3. 国家计算机网络应急技术处理协调中心,北京 100029;4. 移动互联网安全技术国家工程实验室,北京 100876)
摘 要:基于决策树的分组分类算法因易于实现和高效性,在快速分组分类中广泛使用。决策树算法的基本目标
是构造一棵存储高效且查找时间复杂度低的决策树。设计了一种基于规则集统计特性和评价指标的决策树算法——
HyperEC 算法。HyperEC 算法避免了在构建决策树过程中决策树高度过高和存储空间膨胀的问题。HyperEC 算法
对 IP 地址长度不敏感,同样适用于 IPv6 的多维分组分类。实验证明,HyperEC 算法当规则数量较少时,与 HyperCuts
基本相同,但随着规则数量的增加,该算法在决策树高度、存储空间占用和查找性能方面都明显优于经典的决策
树算法。
关键词:决策树;分组分类;统计;规则
中图分类号:TP393.03 文献标识码:A 文章编号:1000-436X(2014)Z1-0058-07
Efficent-cutting packet classification algorithm
b
ased on the statistical decision tree
CHEN Li-nan
1,2
, LIU Yang
3
, MA Yan
1,4
, HUANG Xiao-hong
1
, ZHAO Qing-cong
2
, WEI Wei
1
(1. Network Information Center, Institute of Network Technology, Beijing University of Posts and Communications, Beijing 100876, China;
2. School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China;
3. CNCERT, Beijing 100029, China; 4. National Engineering Laboratory for Mobile Network Security, Beijing 100876, China)
Abstract: Packet classification algorithms based on decision tree are easy to implement and widely employed in high-speed
packet classification. The primary objective of constructing a decision tree is minimal storage and searching time complexity.
An improved decision-tree algorithm is proposed based on statistics and evaluation on filter sets. HyperEC algorithm is a
multiple dimensional packet classification algorithm. The proposed algorithm allows the tradeoff between storage and
throughput during constructing decision tree. For it is not sensitive to IP address length, it is suitable for IPv6 packet classifi-
cation as well as IPv4. The algorithm applies a natural and performance-guided decision-making process. The storage budget
is preseted and then the best throughput is achieved. The results show that the HyperEC algorithm outperforms the HiCuts
and HyperCuts algorithm, improving the storage and throughput performance and scalable to large filter sets.
Key words: decision tree; packet classification; statistic; filters
1 引
言
当前,互联网的飞速发展导致网络带宽和流量
日益增加,使互联网的安全问题已经成为事关国家
稳定、社会安定和经济繁荣的全局性问题。分组分
类是实现网络安全、网络虚拟化和服务质量的基础
收稿日期:2014-10-15
基金项目:国家高技术研究发展计划(“863”计划)基金资助项目(2013AA014702);北京市教委科研计划:基于数据流回
放和标注的数据集生成技术研究基金资助项目(KM201311232017);中央高校基本科研业务费基金资助项目(2014PTB-00-04
2014ZD03-03);中国下一代互联网基金资助项目(CNGI-12-02-027);DNSLAB 基金资助项目
Foundation Items:
The National High Technology Research and Development Program of China(863 Program)( 2013AA014702);
Beijing Municipal Commission of Education Research Project: Research on Dataset Generation Technique Based on Data Flow Re-
playing and Marking (KM201311232017); Fundamental Research Funds for the Central Universities (2014PTB-00-04
2014ZD03-03); China Next Generation Internet Project (CNGI-12-02-027); DNSLAB
doi:10.3969/j.issn.1000-436x.2014.z1.012