Detection and Classification of Peer-to-Peer Traffic: A Survey 30:7
on a packet-by-packet manner, as the most obvious method to accomplishing the task of
capturing traffic is to simply catch each individually data unit traveling in the network.
Some of the existent tools for network management include means to display, process,
statistically analyze, or even make decisions on each packet individually. This per-
packet approach is especially interesting for applications like NIDSs (e.g., Snort [2010]
or Bro [2010]), which need to process and decide upon each packet. Also, sniffers or
protocol analyzers especially designed for offline analysis, like Wireshark [2010] or
Ettercap [2010], usually inspect each packet deeply, gathering information from all the
layers of the protocol stack.
Although packets are individual data units when traveling through the network,
a relation exists between many of them [Jain and Routhier 1986]. Usually, they are
generated by the same request or application, contain acknowledgement messages from
reliability mechanisms (like with Transmission Control Protocol (TCP) traffic), or are
simply carrying an amount of data that is too large to fit in a single Ethernet frame.
Therefore, the relation between the packets comprises a relatively hidden knowledge
about the network and the traffic behavior, which can be assessed by analyzing the
traffic in terms of data flows.
A flow is, most of the time, defined as a set of packets that share a common key:
source and destination IP addresses and transport port numbers [Claffy and McCreary
1999; Duffield 2004; Duffield et al. 2005; IETF 2008]. It is considered active while
the time interval between each packet belonging to the flow is lower than a certain
threshold. The timeout value may depend on the purpose of the analysis. Although
a few studies propose distinct timeouts, Claffy et al. [1995] explored different values
and identified 64 seconds as a good compromise between the size of the flow and the
effort to initialize and terminate the flows. Furthermore, a flow may also be defined
as unidirectional or bidirectional, depending on whether one wants to consider the
packets traveling between two address-port pairs in each direction as two independent
flows, or the packets in both directions as a single flow [Apisdorf et al. 1996; Claffy et al.
1995]. Because of the usual asymmetry of the traffic exchanged between two addresses
in typical client-server connections and also due to the asymmetric routes in the core
Internet, unidirectional flows are mostly used in studies on network performance and
bandwidth management, for which it is useful to measure the differences in the traffic
in both directions [Claffy et al. 1995]. On the other hand, bidirectional flows are a
natural option to represent TCP sessions, and for the purpose of traffic classification,
they are a more logical approach to follow, as the traffic exchanged between two address-
port pairs, in both directions, belongs to the same traffic class and was generated by
the same application. Nonetheless, Smith et al. [2001] were able to successfully use
unidirectional packet headers traces to analyze TCP transactions.
In order to analyze the traffic from a flow perspective, a monitoring tool can still
capture the packets individually, but it has to organize them in a table of flows, based
on the source and destination information (address and port). Several tools (e.g., Coral-
Reef [Moore et al. 2001]) were developed to perform flow-based analyses of traffic from
network adapters or from offline packet traces. However, it is possible to receive the
flow information directly from routers or other network elements (e.g., using a flow
export protocol, like Cisco NetFlow [2010], or the Internet Protocol Flow Information
eXport (IPFIX) [IETF 2008], a standard for exporting flow data currently under devel-
opment). NetFlow data can be read and analyzed by a few existent applications, like
Flow-tools [Romig et al. 2000] or FlowScan [Plonka 2000].
3.3. Collecting Traffic Data
The access to the network data for traffic measuring, as mentioned in a few stud-
ies [Duffield 2004; McGregor 2002], may be performed by copying the transmission
ACM Computing Surveys, Vol. 45, No. 3, Article 30, Publication date: June 2013.