42 China Communications
•
November 2013
TRUSTED COMPUTING AND INFORMATION SECURITY
Accurate Classification of P2P Traffic by
Clustering Flows
HE Jie
1
, YANG Yuexiang
2
, QIAO Yong
1
, TANG Chuan
2
1
College of Computer, National University of Defense Technology, Changsha 410073, China
2
Information Center, National University of Defense Technology, Changsha 410073, China
Abstract: P2P traffic has always been a
dominant portion of Internet traffic since its
emergence in the late 1990s. The method used
to accurately classify P2P traffic remains a key
problem for Internet Service Producers (ISPs)
and network managers. This paper proposes a
novel approach to the accurate classification
of P2P traffic at a fine-grained level, which
depends solely on the number of special flows
during small time intervals. These special
flows, named Clustering Flows (CFs), are de-
fined as the most frequent and steady flows
generated by P2P applications. Hence we are
able to classify P2P applications by detecting
the appearance of corresponding CFs. Com-
pared to existing approaches, our classifier can
realise high classification accuracy by ex-
ploiting only several generic properties of
flows, instead of extracting sophisticated fea-
tures from host behaviours or transport layer
data. We validate our framework on a large set
of P2P traffic traces using a Support Vector
Machine (SVM). Experimental results show
that our approach correctly classifies P2P ap-
plications with an average true positive rate of
above 98% and a negligible false positive rate
of about 0.01%.
Key words: traffic classification; P2P; fine-gr-
ained; support vector machine
I. INTRODUCTION
The continuous emerging of P2P applications
enriches the resources sharing by network, but
it also raises many challenges to network
management. Therefore, the monitor of P2P
applications is very important, and P2P traffic
classification is the key point. Unfortunately,
classifying P2P traffic is problematic both due
to the large number of new emerging P2P ap-
plications and the intentional use of random port
numbers and encryption for network traffic.
Currently, there are roughly three approaches
in the state of the art in traffic classification
according to application protocols [1].
Firstly, traditional port-based classification
[2] is a simple approach based on the assump-
tion that applications use their standard port
numbers assigned by INNA. However, this app-
roach has become unreliable due to the ran-
domness of ports.
Secondly, payload-based techniques, also ca-
lled Deep Packet Inspection (DPI), are based
on the inspection of packets payload [3-7].
Traditional DPI methods inspect the content of
packets looking for distinctive signatures that
allow recognising a given application. These
techniques can only identify traffic generated
by those specific applications, and will be-
come unavailable when the traffic is encrypted.
To overcome these drawbacks, some new DPI
approaches which use the payload data in dif-
ferent perspectives are emerging recently. For
example, Dhamankar et al. [5] used entropy to
reveal the randomness of the encrypted pay-
loads of Skype traffic; Hullar et al. [6] ad-
dressed the classification of P2P applications
using the first 16 bytes of payload of the first
Received: 2013-06-13
Revised: 2013-08-27
Editor: ZHANG Huanguo