抵抗哈希碰撞的聚类保持网络流概化方法

需积分: 5 197 浏览量更新于2024-08-12 收藏 502KB PDF 举报

"保持群集的网络流草图" 在现代云计算和数据中心网络中，网络监控至关重要，因为它们需要获取各种流量统计信息，如流量大小分布和大流量源（heavy hitters）。随着网络速率的增加和流量体积的爆炸式增长，基于草图的近似测量方法得到了广泛研究，它通过牺牲一定的准确性来换取内存和计算成本的降低。然而，这种方法对哈希碰撞非常敏感，这可能影响其性能。本论文提出了一种新的、具有群集保持能力的网络流草图方法，旨在增强对哈希碰撞的抵抗力。作者们进行了等价性分析，将草图与K-means聚类方法联系起来。通过这种分析，他们能够将相似的网络流聚集到同一个桶数组中，以此来减少估计的方差。通过使用平均值，他们可以得到无偏的估计，从而提高整体的估计精度。实验结果显示，该框架能够适应线性增长的流量速率，并提供准确的流量统计。测试床的实验证明了该方法的有效性，它在应对哈希碰撞时展现出更强的鲁棒性，同时保持了良好的流量统计性能。这一创新对于网络监控领域来说是一个重要的进步，因为它能够提高大规模网络环境中流量数据的分析质量和效率，尤其在处理高并发和大数据量的情况下。该研究进一步讨论了如何优化桶数组的分配和更新策略，以确保在有限的系统资源下，仍能实现高效的数据处理。此外，论文还探讨了这种方法在处理实时流量分析、异常检测以及网络安全等应用场景中的潜在优势。总结起来，"保持群集的网络流草图"是一种对抗哈希碰撞的新颖技术，它利用聚类思想改进了传统草图方法，提高了流量统计的精确性和稳定性。这种方法对于提升现代网络环境中的监控能力和数据处理能力具有深远的影响。

Clustering-preserving Network Flow Sketching

Yongquan Fu

, Dongsheng Li

, Siqi Shen

, Yiming Zhang

, Kai Chen

Science and Technology Laboratory of Parallel and Distributed Processing,

College of Computer Science, National University of Defense Technology

SING Lab, Hong Kong University of Science and Technology

Abstract—Network monitoring is vital in modern clouds and

data center networks that need diverse trafﬁc statistics ranging

from ﬂow size distributions to heavy hitters. To cope with

increasing network rates and massive trafﬁc volumes, sketch

based approximate measurement has been extensively studied

to trade the accuracy for memory and computation cost, which

unfortunately, is sensitive to hash collisions.

This paper presents a clustering-preserving sketch method

to be resilient to hash collisions. We provide an equivalence

analysis of the sketch in terms of the K-means clustering.

Based on the analysis result, we cluster similar network ﬂows

to the same bucket array to reduce the estimation variance and

use the average to obtain unbiased estimation. Testbed shows

that the framework adapts to line rates and provides accurate

query results. Real-world trace-driven simulations show that LSS

remains stable performance under wide ranges of parameters and

dramatically outperforms state-of-the-art sketching structures,

with over 10

to 10

times reduction in relative errors for per-

ﬂow queries as the ratio of the number of buckets to the number

of network ﬂows reduces from 10% to 0.1%.

Index Terms—sketch, random projection, hash collision, clus-

tering

I. INTRODUCTION

Network measurement is of paramount importance for traf-

ﬁc engineering, network diagnosis, network forensics, intru-

sion detection and prevention in clouds and data centers, which

need a variety of trafﬁc measurement, such as delay, ﬂow

size estimation, ﬂow distribution, heavy hitters [1], [2], [3].

Recently, the self-running network proposal [4], [5] highlights

an automatic management loop for large-scale networks with

timely and accurate data-driven network statistics as the driv-

ing force for machine learning techniques.

Network-ﬂow monitoring is challenging due to ever increas-

ing line rates, massive trafﬁc volumes, and large numbers of

active ﬂows [6], [7], [8], [9]. Trafﬁc statistics tasks require

advanced data structures and trafﬁc statistical algorithms.

Many space- and time-efﬁcient approaches have been studied,

e.g., trafﬁc sampling, trafﬁc counting, trafﬁc sketching. Com-

pared to other approaches, the sketch has received extensive

attentions due to their competitive trade off between space

resource consumption and query efﬁciency. Further, multiple

sketch structures can be composed for joint trafﬁc analytics.

Existing sketch structures [10], [11], [12], [13] hash in-

coming packets to randomly chosen buckets and take the

This work was sponsored in part by National Key R&D Program of China

under Grant No. 2018YFB0204300, and the National Natural Science Foun-

dation of China (NSFC) under Grant No. 61972409, 61602500, 61402509,

61772541, 61872376.

accumulated counter in these buckets as the estimator. Re-

cently, OpenSketch [14], UnivMon [15], SketchVisor [16],

ElasticSketch [17], and SketchLearn [18] further extend the

generality of the sketch structure to support diverse monitoring

tasks.

The sketch based monitoring approach has a degree of

approximation error due to hash collisions of incoming items,

as multiple keys may be mapped to the same bucket. Hash

collisions are inevitable due to the randomness of the hash

functions. Thus existing methods typically keep multiple in-

dependent copies of the sketch structure and ﬁnd the least

affected ones as the estimator. However, this approach wastes

the space signiﬁcantly. Recently, several approaches [17], [18]

propose to separate large items from the rest into a hash table

to reduce the estimation error. Unfortunately, the hash table

needs to allocate dedicated space for new items, thus it is less

efﬁcient than the sketch with a constant-size bucket structure.

Thus, ﬁnding a space-efﬁcient approach that is resilient to hash

collisions is an open question.

We present a new class of sketch called locality-sensitive

sketch (LSS for short) that is resilient to hash collisions.

LSS approximately minimizes the estimation error based on

a theoretical equivalence relationship between the sketching

error and the approximation error of the K-means clustering in

Sec. IV. This equivalence provides two important insights for

the sketching methodology: clustering similar items together

reduces the approximation error, and averaging the bucket

counter obtains an unbiased estimator. We exploit these two

theoretical insights to the design of a locality-sensitive sketch

structure.

We adapt to online and dynamic network ﬂows with back-

ground clustering and lightweight temporal caching tech-

niques. First, we maintain the clustering model in a back-

ground and periodical process, which obtains close-to-date

samples and trains a clustering model that enables mapping

online ﬂow records with up-to-date cluster centers. Second, the

insertion process should deal with incremental ﬂow counters,

since the ﬂow size grows as packets are delivered. We adapt to

monodically increasing ﬂow counters with a temporal cache

based on a lightweight Cuckoo hash table [19], [20], [21].

We perform extensive evaluation in Section VI. Testbed

shows that the framework adapts to line rates and provides

accurate query results. Trace-driven study reveals that LSS

remains stable performance under wide ranges of parameters

and dramatically outperforms state-of-the-art sketching struc-

tures, with over 10

to 10

times reduction in relative errors

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38686860

粉丝: 10
资源: 971

抵抗哈希碰撞的聚类保持网络流概化方法

群集网络供应技术在网络游戏中的应用

网络游戏网络拓扑结构群集编组方法研究

无线网络移动设备群集系统方法研究

网络游戏-基于群集的网络供应.zip

aws-clustered-video-streams:群集视频流是一种AWS架构，通过为实时视频流提供无缝的区域故障转移功能来提高实时事件的质量和可靠性。 运营商可以从单个窗格中监视群集流的状态，并动态控制播放器消耗的流来自哪个区域

web网络负载平衡群集

人员群集流动自适应元胞自动机模型研究 (2006年)

网络游戏-用于光刻机器群集的网络架构和协议.zip

群集运动引发的智慧网络发展思考：情景网络

网络负载平衡群集.doc

最新资源

aws-clustered-video-streams:群集视频流是一种AWS架构，通过为实时视频流提供无缝的区域故障转移功能来提高实时事件的质量和可靠性。运营商可以从单个窗格中监视群集流的状态，并动态控制播放器消耗的流来自哪个区域