FlowFormers：实时网络流量分类的Transformer模型

需积分: 0 18 浏览量更新于2024-08-03 收藏 537KB PDF 举报

"FlowFormers是基于Transformer模型的实时网络流量分类方法，旨在解决高吞吐量和细粒度特征提取以及深度学习模型有效学习行为流量模式的挑战。FlowPrint是一种新颖的网络行为表示，它从每个流中提取时间序列字节和包长度模式，不依赖于包内容。" 在当前的互联网环境中，网络流量分类（NTC）对于互联网服务提供商（ISPs）来说至关重要。它用于调整网络带宽、预测未来需求、保障用户服务质量以及防御网络攻击。随着数据速率的迅速增长和流量加密的普及，传统的分类方法逐渐难以应对，因此转向利用深度学习（DL）技术来捕获流量的随机行为模式。 FlowFormers正是针对这一需求提出的一种解决方案。它采用了Transformer架构，Transformer在自然语言处理领域已经证明了其在捕捉序列数据中的长期依赖关系方面的强大能力。然而，将Transformer应用于网络流量分类面临两个主要挑战： 1. **高吞吐量和细粒度特征提取**：由于网络流量的高速特性，需要快速而精确地提取关键特征。FlowFormers通过引入FlowPrint解决了这个问题，FlowPrint能够实时地捕获每个网络流的时间序列信息，包括字节和包长度的变化，这些信息对于识别流量模式至关重要。 2. **高效学习流量模式**：传统的深度学习模型可能在处理大规模、复杂流量数据时效率低下。FlowFormers通过设计适应网络流量特性的Transformer变体，优化了模型的学习过程，使其能够更有效地学习和理解网络流量的行为模式。 FlowPrint作为FlowFormers的核心组成部分，其特点是独立于包内容，这意味着它可以处理加密流量，这在当今的互联网环境中非常必要，因为大量的网络通信都是加密的。此外，FlowPrint的无监督特性使得它能够在无需额外标签的情况下学习流量模式，降低了对大量标注数据的依赖。 FlowFormers结合了Transformer的强大学习能力和FlowPrint的有效特征提取，为实时网络流量分类提供了一种新的、高效的解决方案。这种技术对于提升网络管理的智能化水平，提高网络安全性和用户体验具有重要意义。

FlowFormers: Transformer-based Models for Real-time Network Flow Classiﬁcation

Rushi Babaria

Computer Science

BITS Pilani, India

Sharat Chandra Madanapalli

Electrical Engineering & Telecomms

UNSW Sydney, Australia

Himal Kumar

Canopus Networks

Sydney, Australia

Vijay Sivaraman

Electrical Engineering & Telecomms

UNSW Sydney, Australia

Abstract—Internet Service Providers (ISPs) often perform net-

work trafﬁc classiﬁcation (NTC) to dimension network band-

width, forecast future demand, assure the quality of experience

to users, and protect against network attacks. With the rapid

growth in data rates and trafﬁc encryption, classiﬁcation has

to increasingly rely on stochastic behavioral patterns inferred

using deep learning (DL) techniques. The two key challenges

arising pertain to (a) high-speed and ﬁne-grained feature

extraction, and (b) efﬁcient learning of behavioural trafﬁc pat-

terns by DL models. To overcome these challenges, we propose

a novel network behaviour representation called FlowPrint that

extracts per-ﬂow time-series byte and packet-length patterns,

agnostic to packet content. FlowPrint extraction is real-time,

ﬁne-grained, and amenable for implementation at Terabit

speeds in modern P4-programmable switches. We then develop

FlowFormers, which use attention-based Transformer encoders

to enhance FlowPrint representation and thereby outperform

conventional DL models on NTC tasks such as application type

and provider classiﬁcation. Lastly, we implement and evaluate

FlowPrint and FlowFormers on live university network trafﬁc,

and show that a 95% f1-score is achieved to classify popular

application types within the ﬁrst 10 seconds, going up to 97%

within the ﬁrst 30 seconds and achieve a 95+% f1-score to

identify providers within video and conferencing streams.

1. Introduction

Network trafﬁc classiﬁcation (NTC) is widely used by

network operators for tasks including network dimensioning,

capacity planning and forecasting, Quality of Experience

(QoE) assurance, and network security monitoring. How-

ever, traditional classiﬁcation methods based on deep packet

inspection (DPI) are starting to fail as network trafﬁc gets

increasingly encrypted. Many web applications now use

HTTPS (i.e. HTTP with TLS encryption) and browsers like

Google Chrome now use HTTPS by default [1]. Applica-

tions like video streaming (live/on-demand) have migrated to

use protocols like DASH and HLS on top of HTTPS. Non-

HTTP applications which are predominately UDP-based

real-time applications like Conferencing and Gameplay also

use various encryption protocols like AES and Wireguard to

protect the privacy of their users. With emerging protocols

§. Equal contribution.

like TLS 1.3 encrypting server names, and HTTP/2 and

QUIC enforcing encryption by default, NTC is bound to

get even more challenging.

In recent years researchers have proposed to use Ma-

chine Learning (ML) and Deep Learning (DL) based models

to perform various NTC tasks such as IoT device classiﬁ-

cation, network security, and service/application classiﬁca-

tion, ranging from coarse grain application type (e.g. video

streaming, conferencing, downloads, gaming) to speciﬁc

application providers (e.g. Netﬂix, YouTube, Zoom, Skype,

Fortnite). However, many of these existing approaches train

ML/DL models on byte sequences from the ﬁrst few packets

of the ﬂow. While the approach of feeding in raw bytes to a

DL model is appealing due to automatic feature extraction

capabilities, it usually ends up learning patterns such as

protocol headers in un-encrypted applications and server

name in TLS based applications. Such models have failed

to perform well in the absence of such attributes [2], for

example in TLS 1.3 that encrypts the entire handshake

thereby obfuscating the server name.

Our work takes an alternative approach by building a

time-series behavioural proﬁle (a.k.a. trafﬁc shape) of the

network ﬂow, and using that to classify network trafﬁc at

both application type and provider level. Our ﬁrst contri-

bution §3 develops a method to extract ﬂow trafﬁc shape

attributes (aka FlowPrint) at high-speed and in real-time.

FlowPrint’s data representation format keeps track of packet

and byte counts in different packet-length bins without

capturing any raw byte sequences, and provides a richer set

of attributes than the simplistic byte and packet counting

approach in our prior work [3]. It also operates in real-

time, unlike other approaches e.g. [4] that perform post-

facto analysis on packet captures. We show that FlowPrint

is amenable for implementation in modern programmable

hardware switches operating at multi-Terabit scale, and is

hence suitable for deployment in large Tier-1 ISP networks.

Our second (and most signiﬁcant) contribution §4

proposes FlowFormers: DL architectures that introduce

attention-based transformer encoder [5] to the traditional

Convolutional Neural Network (CNN) and Long Short Term

Memory (LSTM) networks. Transformer encoder greatly

improves the performance as it allows the models to give

attention to the relevant parts of the input vector in context of

the NTC task. In other words, transformer encoder enhances

our FlowPrint data prior to being fed to CNN and LSTM.

下载后可阅读完整内容，剩余7页未读，立即下载

qq_41671072

粉丝: 4
资源: 1

FlowFormers：实时网络流量分类的Transformer模型

基于FPGA的智能车牌检测系统设计与实现

【java毕业设计】springbootJava学生选课系统(springboot+vue+mysql+说明文档).zip

JDK-API-1-6-zh-CN

【java毕业设计】springbootJava校车管理信息系统(springboot+vue+mysql+说明文档).zip

resnet模型-python语言pytorch框架训练识别动物-不含数据集图片-含逐行注释和说明文档.zip

仿iphone的listview下拉更新.zip

【Unity 插件】Photon Multiplayer Template (For Game Creator 2)

vgg模型-通过CNN卷积神经网络的宠物品种识别-不含数据集图片-含逐行注释和说明文档.zip

weixin028基于微信小程序小说阅读器设计+ssm后端毕业源码案例设计.zip

Matlab界面设计jiaocheng.txt

最新资源