P2P流量检测与分类技术综述

需积分: 10 124 浏览量更新于2024-07-24 收藏 276KB PDF 举报

"这篇PDF文件是关于‘Detection and Classification of Peer-to-Peer Traffic - A Survey’的，主要讨论了2012年后对于P2P流量检测与分类的研究调查。" 在互联网领域，尤其是物联网的发展中，网络数据的普遍属性发生了显著变化。随着新的互联网模式的出现，带宽消耗增加，网络流量在双向之间更加平衡。这些变革带来了重大挑战，对有效管理网络流量的需求变得至关重要。传统的流量管理方法由于效率低下且容易被绕过，已经无法满足当前的需求。因此，研究者们特别关注新型的流量分类技术，尤其是针对P2P（peer-to-peer）流量的检测和分类。 P2P网络是一种分布式架构，其中每个参与者既是服务的消费者也是服务的提供者，这种模式在文件共享、流媒体和其他大量数据传输应用中非常流行。由于P2P流量的匿名性和动态性，对其进行识别和分类具有相当的复杂性。本文对P2P流量检测和分类的文献进行了详尽回顾，涵盖了相关概念、策略和技术。文章首先介绍了P2P流量的基本特征，如多对多的通信模式、持续的连接状态以及流量的不可预测性。接着，它探讨了各种检测方法，包括基于特征的检测（如端口号、协议标志和数据包模式），基于行为的检测（分析流量模式和时间序列），以及结合机器学习的高级方法，如支持向量机、神经网络等。此外，文章还分析了P2P流量分类的方法，如基于统计特征的分类、基于深度包检查的分类以及利用聚类算法进行分类。这些分类技术旨在区分不同的P2P应用，如BitTorrent、eDonkey和Skype等，以便更好地理解网络资源的使用情况和优化网络管理。文章进一步讨论了面临的挑战，如P2P技术的不断演进、隐私保护问题以及检测与分类的实时性要求。同时，它也概述了未来可能的研究方向，包括更智能的分类算法、对抗性的检测技术以及适应性强的P2P流量管理框架。这篇论文为了解和研究P2P流量的检测与分类提供了全面的视角，对于物联网领域的研究人员、网络管理员和相关行业的从业者来说，是一份宝贵的参考资料。

Detection and Classiﬁcation of Peer-to-Peer Trafﬁc: A Survey 30:7

on a packet-by-packet manner, as the most obvious method to accomplishing the task of

capturing trafﬁc is to simply catch each individually data unit traveling in the network.

Some of the existent tools for network management include means to display, process,

statistically analyze, or even make decisions on each packet individually. This per-

packet approach is especially interesting for applications like NIDSs (e.g., Snort [2010]

or Bro [2010]), which need to process and decide upon each packet. Also, sniffers or

protocol analyzers especially designed for ofﬂine analysis, like Wireshark [2010] or

Ettercap [2010], usually inspect each packet deeply, gathering information from all the

layers of the protocol stack.

Although packets are individual data units when traveling through the network,

a relation exists between many of them [Jain and Routhier 1986]. Usually, they are

generated by the same request or application, contain acknowledgement messages from

reliability mechanisms (like with Transmission Control Protocol (TCP) trafﬁc), or are

simply carrying an amount of data that is too large to ﬁt in a single Ethernet frame.

Therefore, the relation between the packets comprises a relatively hidden knowledge

about the network and the trafﬁc behavior, which can be assessed by analyzing the

trafﬁc in terms of data ﬂows.

A ﬂow is, most of the time, deﬁned as a set of packets that share a common key:

source and destination IP addresses and transport port numbers [Claffy and McCreary

1999; Dufﬁeld 2004; Dufﬁeld et al. 2005; IETF 2008]. It is considered active while

the time interval between each packet belonging to the ﬂow is lower than a certain

threshold. The timeout value may depend on the purpose of the analysis. Although

a few studies propose distinct timeouts, Claffy et al. [1995] explored different values

and identiﬁed 64 seconds as a good compromise between the size of the ﬂow and the

effort to initialize and terminate the ﬂows. Furthermore, a ﬂow may also be deﬁned

as unidirectional or bidirectional, depending on whether one wants to consider the

packets traveling between two address-port pairs in each direction as two independent

ﬂows, or the packets in both directions as a single ﬂow [Apisdorf et al. 1996; Claffy et al.

1995]. Because of the usual asymmetry of the trafﬁc exchanged between two addresses

in typical client-server connections and also due to the asymmetric routes in the core

Internet, unidirectional ﬂows are mostly used in studies on network performance and

bandwidth management, for which it is useful to measure the differences in the trafﬁc

in both directions [Claffy et al. 1995]. On the other hand, bidirectional ﬂows are a

natural option to represent TCP sessions, and for the purpose of trafﬁc classiﬁcation,

they are a more logical approach to follow, as the trafﬁc exchanged between two address-

port pairs, in both directions, belongs to the same trafﬁc class and was generated by

the same application. Nonetheless, Smith et al. [2001] were able to successfully use

unidirectional packet headers traces to analyze TCP transactions.

In order to analyze the trafﬁc from a ﬂow perspective, a monitoring tool can still

capture the packets individually, but it has to organize them in a table of ﬂows, based

on the source and destination information (address and port). Several tools (e.g., Coral-

Reef [Moore et al. 2001]) were developed to perform ﬂow-based analyses of trafﬁc from

network adapters or from ofﬂine packet traces. However, it is possible to receive the

ﬂow information directly from routers or other network elements (e.g., using a ﬂow

export protocol, like Cisco NetFlow [2010], or the Internet Protocol Flow Information

eXport (IPFIX) [IETF 2008], a standard for exporting ﬂow data currently under devel-

opment). NetFlow data can be read and analyzed by a few existent applications, like

Flow-tools [Romig et al. 2000] or FlowScan [Plonka 2000].

3.3. Collecting Trafﬁc Data

The access to the network data for trafﬁc measuring, as mentioned in a few stud-

ies [Dufﬁeld 2004; McGregor 2002], may be performed by copying the transmission

ACM Computing Surveys, Vol. 45, No. 3, Article 30, Publication date: June 2013.

30:8 J. V. Gomes et al.

signal (e.g., through the use of a splitter) and analyzing it on a dedicated network

monitor, by using a router or a switch to copy all the trafﬁc to an output interface, or by

directly tapping a shared link. Nevertheless, there are also a few global infrastructures

for the active measuring of Internet that collect data from worldwide links [Murray

and Claffy 2001]. The datasets containing trafﬁc from computer networks should be

carefully handled in order to protect the privacy of the users, as well as other sensitive

data. Several considerations and good practices regarding this subject are discussed in

Allman and Paxson [2007].

As seen in previous sections, the passive data collection can be made by polling

routers to obtain ﬂows data or by packet capturing. While in the former approach,

data is usually acquired through the use of protocols like IPFIX, in the latter, the

trace ﬁles are collected using commercial or public domain network trafﬁc capturing

software, like tcpdump [2011] and its Windows version, WinDump [2011], or even other

available tools developed with basis on the libpcap [tcpdump 2011] or WinPcap [2011]

libraries.

Although the most natural means is to capture the complete packet, such technique

generates large trace ﬁles, which would require larger storage capacity and processing

power to handle the trafﬁc in high-speed links. Moreover, the increasing integration of

measurement techniques into routers, switches, and other network elements that do

not possess a high processing power [Dufﬁeld 2004; Jurga and Hulb

oj 2007] motivates

the development of solutions that can reduce the amount of data collected, as described

in the next section.

3.4. Trace Reduction

The most common approaches for trace reduction resort to packet ﬁltering or to the

minimization of the data that is kept for future analysis [Dufﬁeld 2004; Arlitt and

Williamson 2007]. It is possible, depending on the speciﬁc goals of each study, to monitor

exclusively the packets from a given application. However, such selection is usually

made using the transport layer port numbers, which is consensually considered a naive

approach. Alternatively, one may select only the packets that establish or ﬁnalize a

connection or a request, or use any other selection criterion that may be more coherent

with the objective of a particular analysis and decrease the number of packets to be

captured.

The amount of data stored can be reduced by saving a summary of each application

protocol-speciﬁc request; by capturing a limited portion of the packet or even only the

headers of the ﬁrst layers of the TCP/IP protocol stack; or by keeping information of a

ﬂow instead of storing each packet that belongs to it.

A particular case of packet ﬁltering is the use of packet sampling methods [Amer

and Cassel 1989], whose objective is to randomly (or pseudorandomly) choose a small

set of the packets observed in the measuring point. It is intended that the set of

packets obtained be as representative as possible of the trafﬁc one plans to measure.

There are different packet sampling techniques which may be more useful in distinct

cases, depending on factors like the goal of the study, the network state, the trafﬁc

characteristics, or the resources constrains. Jurga and Hulb

oj [2007] elaborated on the

existent methods for packet sampling and their application in network measurements.

Dufﬁeld [2004] addressed the subject of Internet trafﬁc sampling as well, providing

a long and sound structured discussion of several important topics on passive trafﬁc

measurement.

4. TRAFFIC ANALYSIS AND CLASSIFICATION APPROACHES

In the early times of the Internet, trafﬁc classiﬁcation was a straightforward task

that was easily accomplished by matching the port numbers of the transport protocols

ACM Computing Surveys, Vol. 45, No. 3, Article 30, Publication date: June 2013.

剩余39页未读，继续阅读

a_long_sky_

粉丝: 0
资源: 13

P2P流量检测与分类技术综述

Chinese-Text-Classification-Pytorch-mas

[Signal Detection and Classification in MATLAB]: How to Identify Patterns in Signals

java基于ssm+jsp珠宝购物网站系统源码 带毕业论文

基于SSM的企业工资管理系统.zip(毕设&课设&实训&大作业&竞赛&项目)

基于java的大学生兼职系统设计与实现.docx

沙威玛传奇(电脑游戏)

使用加权最小二乘法进行电力系统状态估计。测量包括电压幅值、功率注入和功率流Matlab代码.rar

vue+SpringBoot488基于springboot的医务室管理系统java毕业设计源码含论文.rar

P1024 [NOIP2001 提高组] 一元三次方程求解

java基于ssm+jsp个体户商城管理系统源码 带毕业论文

最新资源

java基于ssm+jsp珠宝购物网站系统源码带毕业论文

java基于ssm+jsp个体户商城管理系统源码带毕业论文