高速网络流量下恶意镜像网站识别：93.42%准确率的方法

60 浏览量更新于2024-09-03 收藏 858KB PDF 举报

本文探讨了"面向高速网络流量的恶意镜像网站识别方法"，针对网络环境中恶意信息通过创建镜像网站逃避常规检查的问题，提出了一种创新的解决方案。该方法的核心步骤包括以下几个部分： 1. 数据提取与还原：首先，从高速网络流量中高效地捕获并解析出碎片化的数据，这些数据是构成网页的基本元素。通过这种处理，能够确保原始网页源码的完整性，以便后续的精确分析。为了进一步提高识别精度，采用了标准化处理技术，这有助于消除因数据格式或编码差异带来的干扰。 2. 网页源码分析：将提取到的网页源码分成若干个子块，然后运用相似度散列算法（如MD5、SHA-1等）对每个子块计算散列值。散列值可以作为每个网页源码的指纹，使得即便面对轻微的变化也能捕捉到整体的相似性。同时，作者引入了海明距离来量化网页源码之间的相似性，这是一种衡量两个字符串间差异的度量方式。 3. 快照特征提取：为了更全面地比较网页，文章还涉及到网页快照的处理。通过SIFT（尺度不变特征变换）算法，从网页快照中提取关键特征点，这些特征点具有很好的旋转和缩放不变性。接着，通过聚类分析和映射处理，将这些特征点转化为网页快照的感知散列值，这是一种压缩表示形式，便于后续快速比较。 4. 网页相似性计算：基于感知散列值，可以快速计算出两个网页之间的相似度，这有助于区分正常镜像与恶意复制的网页。通过这种方法，作者能够在高速网络环境下实现实时的恶意镜像网站检测，确保信息的安全。实验结果表明，该方法在真实流量环境中的表现优异，准确率高达93.42%，召回率达到了90.20%，F值（精确度和召回率的调和平均值）为0.92，这意味着方法具有很高的识别效率和准确性。此外，处理时延仅为20微秒，这确保了在实时网络环境中能有效应对恶意活动。总结来说，这篇文章提供了一种有效的方法来识别高速网络流量中的恶意镜像网站，通过结合网页源码分析、散列算法以及网页特征提取技术，实现了高效且准确的恶意网站检测，对于网络安全至关重要。

2019 年 7 月 Journal on Communications July 2019

2019089-1

第 40 卷第 7 期通信学报 Vol.40

No.7

面向高速网络流量的恶意镜像网站识别方法

张蕾

1,2

，张鹏

，孙伟

，杨兴东

，邢丽超

1,2

（1. 中国科学院大学网络空间安全学院，北京 100049；2. 中国科学院信息工程研究所，北京 100093；

3. 北京交通大学计算机与信息技术学院，北京 100044；4. 北京航空航天大学计算机学院，北京 100191）

摘要：针对网络环境中造成危害的信息通过镜像网站进行传播从而绕过检查的问题，提出了面向高速网络流量

的恶意镜像网站识别方法。首先，从流量中提取碎片化数据并且还原网页源码，同时加入标准化处理来提高识别

准确率；然后，将网页源码分块，利用相似度散列算法对每个网页源码分块计算散列值，得到网页源码的相似度

散列值，同时引入海明距离来计算网页源码之间的相似性；最后，截取网页快照，提取其 SIFT 特征点，通过聚

类分析和映射处理得到网页快照的感知散列值，通过感知散列值计算网页相似性。在真实流量下的实验表明，所

提方法的准确率为 93.42%，召回率为 90.20%，F 值为 0.92，处理时延为 20 μs。通过所提方法，在高速网络流量

下可以有效地检测恶意镜像网页。

关键词：恶意镜像网站；相似度散列算法；网页相似性

中图分类号：TP309

文献标识码：A

doi: 10.11959/j.issn.1000−436x.2019089

IMM4HT: an identification method of malicious

mirror website for high-speed network traffic

ZHANG Lei

1,2

, ZHANG Peng

, SUN Wei

, YANG Xingdong

, XING Lichao

1,2

1. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China

2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China

3. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

4. School of Computer Science and Engineering, Beihang University, Beijing 100191, China

Abstract: Aiming at the problem that some information causing harm to the network environment was transmitted

through the mirror website so as to bypass the detection, an identification method of malicious mirror website for

high-speed network traffic was proposed. At first, fragmented data from the traffic was extracted, and the source code of

the webpage was restored. Next, a standardized processing module was utilized to improve the accuracy. Additionally, the

source code of the webpage was divided into blocks, and the hash value of each block was calculated by the simhash al-

gorithm. Therefore, the simhash value of the webpage source codes was obtained, and the similarity between the webpage

source codes was calculated by the Hamming distance. The page snapshot was then taken and SIFT feature points were

extracted. The perceptual hash value was obtained by clustering analysis and mapping processing. Finally, the similarity

of webpages was calculated by the perceptual hash values. Experiments under real traffic show that the accuracy of the

method is 93.42%, the recall rate is 90.20%, the F value is 0.92, and the processing delay is 20 μs. Through the proposed

method, malicious mirror website can be effectively detected in the high-speed network traffic environment.

Key words: malicious mirror website, simhash algorithm, webpage similarity

收稿日期：2018−11−09；修回日期：2019−03−04

通信作者：张鹏，pengzhang@iie.ac.cn

基金项目：国家重点研究发展计划基金资助项目（No.2016YFB0801300）；国家自然科学基金资助项目（No.61602474,

o.61602467, No.61702552）

Foundation Items: The National Key Research and Development Program of China (No.2016YFB0801300), The National Natural

Science Foundation of China (No.61602474, No.61602467, No.61702552)

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38696196

粉丝: 9
资源: 873

高速网络流量下恶意镜像网站识别：93.42%准确率的方法

网络安全监测方案.docx

局域网端口镜像 流量排行

华为交换机流量镜像vlan

使用探针加流量镜像的方式进行流量分析的优劣势

limbo专用镜像网站

pycharm信任镜像网站

神州数码端⼝流量镜像

github镜像网站安全吗

macos镜像官方下载方法

提供在镜像网站下载github项目的方法

最新资源

局域网端口镜像流量排行