API调用序列中的高频项集：恶意软件行为识别策略

4 浏览量更新于2024-08-26 收藏 956KB PDF 举报

本文主要探讨了利用Windows应用程序接口（API）调用序列中的频繁项集来分析恶意软件的行为特征。在当前的恶意软件检测策略中，静态分析和动态分析方法中，理解API的使用是识别恶意软件行为的重要手段。作者 Yong Qiao 和 Yuexiang Yang 以及 Lin Ji 从 National University of Defense Technology 的 School of Computer 和 Information Center 分别出发，提出了针对API调用序列中的频繁项集进行恶意软件分析的新思路。研究的核心假设是，频繁出现的由API名称和/或API参数构成的项集，能够有效地揭示恶意软件的行为模式。作者通过将恶意软件二进制文件基于它们API调用序列的频繁项集进行聚类，验证这一假设。在这个过程中，他们详细介绍了几个关键步骤： 1. **API调用抽象**：首先，对恶意软件执行期间的API调用进行抽象处理，这涉及到从原始代码中提取出API的名称和参数，以便于后续分析。 2. **频繁项集挖掘**：然后，利用数据挖掘技术，如Apriori算法或FP-Growth算法，从大量的API调用序列中找出频繁出现的项组合，这些项组合通常代表了恶意软件行为的共同模式。 3. **相似性计算**：通过比较不同恶意软件之间的频繁项集，计算它们在API调用模式上的相似度，这有助于区分正常软件与恶意软件的差异，并为聚类提供依据。在大规模恶意软件数据集上进行了实验，结果表明，基于API调用频繁项集的聚类方法在恶意软件分类和行为识别方面表现出良好的性能。这种分析方法不仅提供了更深层次的恶意软件行为洞察，而且有助于开发更为精确的恶意软件检测工具，提高网络安全防护能力。总结来说，本文为恶意软件分析领域提供了一种新颖且有效的分析框架，通过抽象和挖掘API调用序列中的频繁项集，能够有效识别恶意软件的行为特征，从而支持更精准的恶意软件检测和防御策略。

Analyzing Malware by Abstracting the Frequent Itemsets in API call Sequences

Yong Qiao, Jie He

School of Computer

National University of Defense Technology

Changsha, China

qiaoyong10@nudt.edu.cn

Yuexiang Yang, Lin Ji

Information Center

National University of Defense Technology

Changsha, China

yyx@nudt.edu.cn

Abstract—Analyzing the usage of Windows Application

Program Interface (API) is a common way to understand

behaviors of Malicious Software (malware) in either static

analysis or dynamic analysis methods. In this work, we focus

on the usage of frequent messages in API call sequences, and

we hypothesize that frequent itemsets composed of API names

and/or API arguments could be valuable in the identification

of the behavior of malware. For verification, we introduced

clustering processes of malware binaries based on their

frequent itemsets of API call sequences, and we evaluated the

performance of malware clustering. Specific implementation

processes for malware clustering, including API calls

abstraction, frequent itemsets mining and similarity

calculation, are illustrated. The experiment upon a big

malware dataset demonstrated that merely using the frequent

messages of API call sequences can achieve a high precision

for malware clustering while significantly reducing the

computation time. This also proves the importance of frequent

itemsets in API call sequences for identifying the behavior of

malware.

Keywords-malware; frequent-itemsets; API call sequences;

clustering; Sandbox

I. INTRODUCTION

Malicious software, or malware for short, ranging from

classic computer viruses to Internet worms, bots and Trojan

horses, have brought countless hazards to nowadays

Internet, especially to the most prevalent desktop operating

system – Microsoft Windows. Various methods have been

proposed for malware analysis. One of the most common

methods is to analyze the behavior of malware by

abstracting the API calls. This is because most applications

running in user mode need to call Windows API functions

for requesting services from the kernel in Windows.

Therefore, the sequence of API calls is capable to represent

the main behavior of a run-time application in such period.

Moreover, we can further analyze the function parameters

and track the information flows from the API call sequences

[1].

As early as 1998, Hofmeyr et al.[2] studied the intrusion

detection in UNIX systems using sequences of system calls.

Similarly, Shankarapani et al.[3] proposed two frameworks,

called SAVE (Static Analyzer for Vicious Executables) and

MEDic (Malware Examiner using Disassembled Code), to

detect malicious codes using the API call sequences or

static API call set. However, the methods mentioned above

those abstract API calls of malware by static codes

disassemblers, like Win32Dasm, OllyDbg, et al., are

challenged by the continuously updated obfuscation and

packing techniques, which makes it difficult for the static

methods to get correct API calls. Therefore, novel methods

to dynamically abstract API calls have been proposed

recently [4-9]. In contrast to static techniques, dynamic

analysis methods monitor the behavior of malware during

run-time, which is often indicative for malicious activity

and hard to conceal.

Although abstracting the API calls during run-time

provides means for studying the behavior of malware, it is

not sufficient to detect malware. The ability to analyze the

API calls automatically and deeply is also required. To the

best of our knowledge, two teams have made good progress

in terms of analysis for the sequences of API calls. The first

framework is proposed by Rieck et al. [5], which embeds

the API call sequences to high-dimensional vector spaces,

and allows for malware clustering and classification with

high precision. The second one is introduced by Ahmed et

al. [8], which is able to detect malware using statistical

features that are extracted from both spatial (arguments) and

temporal (sequences) information from API calls. Both

frameworks use full sequences of API calls as input

parameters for similarity calculation (the similarity between

one binary to another binary or one binary to a model).

However, not all information from API calls is valuable for

calculating the similarity between different malicious

binaries. On the contrary, in many cases, redundant

messages can reduce the similarity even two binaries

originally belonging to the same category. Moreover,

calculating upon whole sequences is time-consuming,

especially when dealing with a large number of malware

binaries.

We consider that the frequent messages of API call

sequences, like frequent arguments or frequent itemsets

composed of API names and API arguments, play an

important role in reflecting the behavior of malware. In this

paper, we ran clustering processes of malware binaries

based on their frequent itemsets of API call sequences, and

we evaluated the performance of malware clustering. We

then explained the specific processes of this work with a

formal description of the frequent API itemsets mining

method. At last, we utilized a big malware dataset including

3131 malware binaries for executing clustering based on the

frequent messages of API call sequences, the results of

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38502722

粉丝: 5
资源: 926

API调用序列中的高频项集：恶意软件行为识别策略

基于抽象API调用序列的Android恶意软件检测方法.pdf

方法调用序列追踪工具

【YAML终极指南】：Python开发者必备的15个数据序列化技巧与案例分析

【中间件与API设计】：C#中间件在RESTful API设计中的应用策略

【API设计与文档编写】：Java开发者必备的7项原则

【TFC API接口应用】：编程接口使用的10个案例分析

Django RESTful API设计：一步到位构建可扩展API服务

API设计最佳实践：打造高效Node.js电商系统API

【Bottle和RESTful API设计】：构建符合REST原则的Web服务，提升API可用性

C++与数据库交互：掌握ORM框架与SQL调用的艺术

最新资源