微集群驱动的P2P流量分类与数据流聚类

102 浏览量更新于2024-08-26 收藏 334KB PDF 举报

"这篇研究论文探讨了一种基于微集群的数据流聚类方法，用于P2P流量分类。在P2P网络环境中，由于新的对等社区不断加入，旧的社区频繁退出，导致流量特征发生变化，即存在概念漂移。因此，论文提出了一种名为McStream的概念自适应算法，该算法利用流数据挖掘技术来识别互联网流量中的P2P应用。McStream采用潜在微集群结构、异常微集群结构和已有的微集群结构，以适应概念漂移并实现模型的增量更新。这种方法能够在有限的内存条件下有效地分类P2P流量并检测概念漂移。" 正文: P2P（Peer-to-Peer）流量分类是网络管理和监控的关键任务，它有助于优化网络资源分配，防止非法活动，以及确保服务质量。传统的机器学习方法在P2P流量识别上取得了一定的成功，但它们往往难以应对动态变化的网络环境，即概念漂移（Concept Drift）。概念漂移是指数据流的分布随着时间或条件的变化而发生变化，这对静态模型提出了挑战。论文提出的McStream算法是一种针对这种问题的解决方案。它基于微集群（Micro-Cluster）理论，这是一种在数据流聚类中广泛使用的概念。微集群是一种小型且高度凝聚的聚类，它们在数据流中捕获了局部模式，这使得McStream能够快速响应流量特征的微小变化。论文中提到的两种微集群结构——潜在微集群和异常微集群，分别用于捕捉可能的新趋势和识别异常行为，这在P2P流量分析中至关重要，因为P2P网络往往包含大量瞬息万变的连接。潜在微集群结构允许算法预测未来的流量模式，而异常微集群结构则帮助检测异常流量，可能是由于新的P2P应用的出现或者恶意活动。通过结合这两种结构，McStream可以实现模型的动态更新，即使在网络流量模式发生显著变化时也能保持高精度的分类性能。此外，McStream算法设计考虑了有限内存的约束，这是实时流量分析的一个实际挑战。它使用一种高效的数据管理策略，只保留关键信息，以适应不断变化的流量特征，同时避免存储需求过大导致的计算效率降低。总结来说，"基于微集群的P2P流量分类数据流聚类方法"通过引入概念漂移适应性和内存优化的特性，为P2P流量识别提供了一个灵活且强大的工具。这一方法不仅能够应对P2P网络中不断变化的流量特征，还能够及时发现新的应用和异常行为，对于网络管理和安全监控具有重要价值。未来的研究可以进一步探索如何将McStream应用于更广泛的网络流量场景，并与其他机器学习和深度学习方法相结合，以提高分类效果和预测能力。

A Micro-Cluster-Based Data Stream Clustering Method For P2P Traffic

Classification

Guanghui Yan

1, a

, Minghao Ai

2,b

School of Electrical and Computer Engineering Lanzhou Jiaotong University

yangh9805@qq.com,

aiminghao@sina.com

Keywords: Peer-to-Peer Traffic Identification, Concept Drift, Micro-cluster Based Clustering

Abstract. Many machine learning techniques were proposed to classify P2P traffic and each with

reasonable successes. But in the real P2P network environment, new communities of peers often

attend and old communities of peers often leave. It requires the identification methods to be capable

of coping with concept drift and updating the model incrementally. In this paper, we presented a

concept-adapting algorithm McStream which was based on streaming data mining techniques to

identify P2P applications in Internet traffic. The McStream used two micro-cluster structures,

potential micro-cluster structures and outlier micro-cluster structures, to classify the P2P traffic and

discovered the concept drift with limited memory. Our performance studied over a number of real

data which was captured at a main gateway router demonstrates the effectiveness and efficiency of our

method.

Introduction

With the development of Internet technology, accurate classification and identification of internet

applications play a very important role in many network tasks like: fault monitoring, network

planning, flow prioritization and internet security. Over the last few years, the utilization of peer to

peer (P2P) application, such as file sharing, VoIP, and VoD, media streaming is growing dramatically

and becoming a significant portion of the whole Internet traffic. P2P technology allows data

transmitted between any two peers in P2P network without limiting the flow and bandwidth. This

transfer model exacerbates network congestion, causes the performance degradation of traditional

client-server applications and accelerates the Internet virus or pirate files spread. So, how to identify

P2P flow became an immediate problem of urgent need to solve.

Many papers about P2P identification have published at international conference and academic

journal since the year 2000. The initial approach to identify the P2P application relied on mapping

application to well-know port numbers. It was discussed in Ref. [1]. But [2, 3] confirmed that it has

been ineffective now because P2P file sharing used dynamic ports for communication at present.

The second work was payload-based analysis which was mentioned in [4, 5]. This approach,

however, faced several technical problems now. First, it had to update its signature list frequently for

the purpose of addressing the change of P2P applications. Moreover, these techniques failed to detect

encrypted traffic and many P2P applications began to use encryption now.

To overcome above-mentioned limitation, Machine Learning algorithms were imported to identify

the Internet traffic and each of them can achieve a satisfactory effect in [7-9]. However, identification

P2P application based Machine Learning methods confront several challenges:

1. In contrast, labeled samples are very scare than unlabeled. Classifiers which traditional

supervised learning method produce on few labeled samples do not perform well when they are used

to classify unknown samples.

2. Concept drift can be neglected in the real P2P flow environment. In P2P environment, new

communities of peers often attend and old communities of peers often leave, which make the

distribution of samples changing dynamically. So, the optimal classifiers built on old samples may not

Applied Mechanics and Materials Vols. 263-266 (2013) pp 1121-1126

doi:10.4028/www.scientific.net/AMM.263-266.1121

www.ttp.net. (ID: 180.95.224.42-29/11/12,10:31:15)

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38534683

粉丝: 3
资源: 1020

微集群驱动的P2P流量分类与数据流聚类

基于Squeezer 算法的文本数据流聚类

基于维度最大熵数据流聚类的异常检测方法

基于小波概要的并行数据流聚类

基于近邻传播的分布式数据流聚类算法.pdf

基于时间衰减的分布式数据流聚类算法.pdf

P2P流量精准分类：聚类流方法

基于网格和密度的数据流聚类算法研究

基于Hadoop MapReduce的分布式数据流聚类算法研究.pdf

一种基于 Hash 函数抽样的数据流聚类算法1

基于网格耦合的数据流聚类.pdf

最新资源