FlowFormers: Transformer-based Models for Real-time Network Flow Classification
Rushi Babaria
§
Computer Science
BITS Pilani, India
Sharat Chandra Madanapalli
§
Electrical Engineering & Telecomms
UNSW Sydney, Australia
Himal Kumar
Canopus Networks
Sydney, Australia
Vijay Sivaraman
Electrical Engineering & Telecomms
UNSW Sydney, Australia
Abstract—Internet Service Providers (ISPs) often perform net-
work traffic classification (NTC) to dimension network band-
width, forecast future demand, assure the quality of experience
to users, and protect against network attacks. With the rapid
growth in data rates and traffic encryption, classification has
to increasingly rely on stochastic behavioral patterns inferred
using deep learning (DL) techniques. The two key challenges
arising pertain to (a) high-speed and fine-grained feature
extraction, and (b) efficient learning of behavioural traffic pat-
terns by DL models. To overcome these challenges, we propose
a novel network behaviour representation called FlowPrint that
extracts per-flow time-series byte and packet-length patterns,
agnostic to packet content. FlowPrint extraction is real-time,
fine-grained, and amenable for implementation at Terabit
speeds in modern P4-programmable switches. We then develop
FlowFormers, which use attention-based Transformer encoders
to enhance FlowPrint representation and thereby outperform
conventional DL models on NTC tasks such as application type
and provider classification. Lastly, we implement and evaluate
FlowPrint and FlowFormers on live university network traffic,
and show that a 95% f1-score is achieved to classify popular
application types within the first 10 seconds, going up to 97%
within the first 30 seconds and achieve a 95+% f1-score to
identify providers within video and conferencing streams.
1. Introduction
Network traffic classification (NTC) is widely used by
network operators for tasks including network dimensioning,
capacity planning and forecasting, Quality of Experience
(QoE) assurance, and network security monitoring. How-
ever, traditional classification methods based on deep packet
inspection (DPI) are starting to fail as network traffic gets
increasingly encrypted. Many web applications now use
HTTPS (i.e. HTTP with TLS encryption) and browsers like
Google Chrome now use HTTPS by default [1]. Applica-
tions like video streaming (live/on-demand) have migrated to
use protocols like DASH and HLS on top of HTTPS. Non-
HTTP applications which are predominately UDP-based
real-time applications like Conferencing and Gameplay also
use various encryption protocols like AES and Wireguard to
protect the privacy of their users. With emerging protocols
§. Equal contribution.
like TLS 1.3 encrypting server names, and HTTP/2 and
QUIC enforcing encryption by default, NTC is bound to
get even more challenging.
In recent years researchers have proposed to use Ma-
chine Learning (ML) and Deep Learning (DL) based models
to perform various NTC tasks such as IoT device classifi-
cation, network security, and service/application classifica-
tion, ranging from coarse grain application type (e.g. video
streaming, conferencing, downloads, gaming) to specific
application providers (e.g. Netflix, YouTube, Zoom, Skype,
Fortnite). However, many of these existing approaches train
ML/DL models on byte sequences from the first few packets
of the flow. While the approach of feeding in raw bytes to a
DL model is appealing due to automatic feature extraction
capabilities, it usually ends up learning patterns such as
protocol headers in un-encrypted applications and server
name in TLS based applications. Such models have failed
to perform well in the absence of such attributes [2], for
example in TLS 1.3 that encrypts the entire handshake
thereby obfuscating the server name.
Our work takes an alternative approach by building a
time-series behavioural profile (a.k.a. traffic shape) of the
network flow, and using that to classify network traffic at
both application type and provider level. Our first contri-
bution §3 develops a method to extract flow traffic shape
attributes (aka FlowPrint) at high-speed and in real-time.
FlowPrint’s data representation format keeps track of packet
and byte counts in different packet-length bins without
capturing any raw byte sequences, and provides a richer set
of attributes than the simplistic byte and packet counting
approach in our prior work [3]. It also operates in real-
time, unlike other approaches e.g. [4] that perform post-
facto analysis on packet captures. We show that FlowPrint
is amenable for implementation in modern programmable
hardware switches operating at multi-Terabit scale, and is
hence suitable for deployment in large Tier-1 ISP networks.
Our second (and most significant) contribution §4
proposes FlowFormers: DL architectures that introduce
attention-based transformer encoder [5] to the traditional
Convolutional Neural Network (CNN) and Long Short Term
Memory (LSTM) networks. Transformer encoder greatly
improves the performance as it allows the models to give
attention to the relevant parts of the input vector in context of
the NTC task. In other words, transformer encoder enhances
our FlowPrint data prior to being fed to CNN and LSTM.