大数据环境下的多态蠕虫自动特征提取算法

需积分: 10 155 浏览量更新于2024-08-13 收藏 398KB PDF 举报

"大数据环境中多态蠕虫特征的自动提取方法" 在当前的大数据环境中，多态蠕虫（Polymorphic Worms）已经成为网络安全领域的一大威胁。这类蠕虫能够通过不断改变自身形态，躲避传统签名检测方法，使得特征提取和防御变得尤为困难。针对这一挑战，该研究论文提出了一种基于改进的词频-逆文档频率（Term Frequency-Inverse Document Frequency, TF-IDF）的自动签名提取算法。 TF-IDF是一种常用的文本挖掘技术，用于衡量一个词汇在文档中的重要性。在网络安全领域，它可以被用来识别和区分恶意代码的关键特征。论文作者Fangwei Wang等人通过改进这一方法，旨在提高在大数据环境下对多态蠕虫特征提取的准确性和效率。传统的TF-IDF算法主要考虑词汇在单个文档中的出现频率以及在整个文档集合中的普遍性。然而，对于多态蠕虫来说，简单的频率统计可能无法有效捕捉其动态变化的特性。因此，论文中提到的改进算法可能包括对TF-IDF的权重计算进行优化，考虑蠕虫行为的时序性、变异模式以及噪声环境下的抗干扰能力。论文中可能会详细介绍以下几点： 1. **改进的TF-IDF模型**：如何调整TF-IDF的计算方式，以便更准确地反映多态蠕虫的特征。可能涉及对蠕虫代码段的特殊处理，比如引入时间窗口来考虑行为序列，或者引入动态权重来适应蠕虫的变异速度。 2. **噪声处理机制**：在大数据环境中，数据噪声是常见的问题。研究可能阐述了如何通过某种滤波或降噪技术来减少噪声对特征提取的影响，从而提高检测的准确性。 3. **性能评估**：论文可能会通过实验对比传统方法和改进方法在不同数据集上的表现，如检测率、误报率等指标，以证明新方法的有效性。 4. **应用和未来工作**：讨论提出的算法如何应用于实际的大数据安全系统，并指出可能的扩展方向，如结合机器学习或深度学习技术进一步提升蠕虫特征识别的能力。这项工作对于提升大数据环境下的网络安全防护能力具有重要意义，它提供了一种自动化的方法来应对多态蠕虫的威胁，有助于及时发现并阻止这些恶意程序的传播。

An Automatic Signature-Based Approach for Polymorphic Worms in Big Data

Environment

Fangwei Wang

Lab of Network and

Information Security of

Hebei Province

Hebei Normal University

Shijiazhuang, China

fw_wang@hebtu.edu.cn

Shaojie Yang

C Lab of Network and

Information Security of

Hebei Province

Hebei Normal University

Shijiazhuang, China

1657392397@qq.com

Dongmei Zhao

Lab of Network and

Information Security of

Hebei Province

Hebei Normal University

Shijiazhuang, China

dmzhao@hebtu.edu.cn

Changguang Wang

†

Lab of Network and

Information Security of

Hebei Province

Hebei Normal University

Shijiazhuang, China

wangcg@hebtu.edu.cn

Abstract—In a big data environment, the signatures of

polymorphic worms need to be extracted accurately and

efficiently, which is of great importance to prevent them. At

present, however, it is difficult to generate the accurate

signature for polymorphic worms, especially under the noise

condition. To solve this issue, we propose an automatic

signature extraction algorithm for polymorphic worms based

on the improved Term Frequency-Inverse Document

Frequency (TF-IDF). Firstly, each sample of the dataset is

divided into some documents. One document is selected

randomly and its fist worm sample is analyzed. Then the

suspicious substring is selected by calculating the TF value

through traversing the document. Secondly, all the documents

are traversed and the IDF value is figured out. Finally, the TD-

IDF value is determined and the accurate worm signature is

generated. This algorithm is tested by various kinds of worms

and compared with the existing methods. The results show that

our algorithm can generate polymorphic worm signatures

more accurately and efficiently compared with similar

methods under the noise condition. It can also save the state of

worm signature extraction and has excellent scalability.

Keywords-Polymorphic worm; Signature extraction; TF-

IDF; Worm detection

I. INTRODUCTION

Along with the globalization of the Internet and the

arrival of big data era, network worms have become a most

serious threat to network security and data security and

caused a lot of losses, whose propagation evolves from the

mode of human-machine interaction relied on hardware

devices to automatic duplication and propagation rested with

global network, operating system and application software

[1-3]. The polymorphic worm is a kind of worms that can

change its appearance with each infection with the help of

variation, encryption and semantics-preserving. Its signatures

present composability and are difficult to describe by the

traditional single signature, which greatly challenges the

traditional methods of worm detection and defense.

Therefore, it becomes major research subject to detect the

polymorphic worm rapidly and generate its signatures

quickly.

The main method suitable for detecting polymorphic

worms is to extract attack signatures by analyzing the

suspicious traffic, which does not need the host information,

the source code of vulnerabilities and the binary codes. It is

based on the existing technologies of signature extraction

and improved by combining the own signatures of

polymorphic worms. It can not only detect the known worms

but also detect the new samples of polymorphic worms well

and more accurately.

The thought that the worm attack signatures are extracted

automatically was first put forward in the “Honeycomb-

creating” [4]. Though it proposed the automatic extraction

idea of worm signatures, it could not collect enough data to

analyze the worm and extract its signature due to a smaller

number of the infected hosts at the beginning of worm

propagation. Thus, this work did not fully reflect the

advantages of automatic extraction. Autograph [5] system

could generate worm signatures according to the content

length based on single string matching. This system can

provide some reference to extract worm signatures, but the

classification of the worm signatures generated by the system

is too onefold to detect more sophisticated polymorphic

worms. Newsome et al. [6] first proposed a system to detect

polymorphic worms by use of Polygraph, which used some

substrings to generate three types of signatures (Conjunction

signature, Token-subsequence signature, and Bayes

signature) to extract invariant that satisfied the required

conditions from the suspicious flow. However, the signatures

generated by the system showed poor performance and a

high false alarm rate under the condition of noise. It was also

helpless for the polymorphic worms which adopt instruction

substitution, NOP, and instruction transformation, and

difficult to realize rapid signature extraction.

Wang et al. [7] proposed a network-based method to

generate signatures for polymorphic worms, which could

generate length-based signatures for buffer overflow

vulnerabilities. Stephenson et al. [8] proposed a quasi-

species model to describe the propagation of polymorphic

worms and obtained the maximum allowable time of

preventing network worms. Sun et al. [9] proposed an

RSWD (Rough Set Worm Detection) algorithm to detect

polymorphic worms based on rough set theory. Iwahashi et

al. [10] suggested using Petri Net to generate a worm

signature automatically. Tang et al. [11-12] utilized the gene

sequence alignment method in bioinformatics to generate

223

2019 International Conference on Networking and Network Applications (NaNA)

DOI 10.1109/NaNA.2019.00047

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38734361

粉丝: 6

大数据环境下的多态蠕虫自动特征提取算法

SESG算法：多态蠕虫特征自动提取技术

改进TF-IDF的多态蠕虫特征自动提取算法研究

多态蠕虫产生器：设计、实现与抵御策略

一种新的多态蠕虫签名生成方法

多态蠕虫产生器的设计与实现

GSTO1基因在中国14个民族中的多态分布特征

C#中多态现象和多态的实现方法

多态中访问变量、方法的特点.xmind

Hamsa：零日多态蠕虫的快速签名生成系统

排列扫描多态蠕虫传播实验仿真：动力学分析与防御探讨

最新资源