提升JavaScript恶意软件分类器对抗规避攻击的安全性

64 浏览量更新于2024-08-28 收藏 548KB PDF 举报

"ImprovingJavaScriptMalwareClassiﬁer’sSecurityagainstEvasionbyParticleSwarmOptimization" 这篇研究论文探讨了如何提升JavaScript恶意软件分类器在应对规避攻击时的安全性，利用粒子群优化算法进行特征选择以增强分类器的性能。一、引言随着JavaScript在Web应用程序中的广泛应用，其安全问题日益凸显。近年来，恶意软件开发者开始利用JavaScript的灵活性和复杂性来设计和传播恶意代码，这使得JavaScript恶意软件检测变得至关重要。然而，这些恶意脚本可以通过修改以逃避检测，伪装成无害的脚本，对现有的机器学习检测方法构成了挑战。因此，提高分类器抵抗规避攻击的能力成为了研究人员关注的焦点。二、研究背景与问题机器学习技术在JavaScript恶意软件检测中的应用已经取得了一定成效，但这种依赖于特定特征的检测方式容易受到恶意脚本的规避策略影响。规避攻击是恶意软件开发者通过改变恶意代码的特征，使其能绕过分类器的检测。因此，理解这些规避策略并设计出更安全的分类器是当前研究的关键。三、方法与贡献论文提出了一种新的度量标准，用于评估分类器对规避攻击的防御能力。同时，为提高分类器的安全性，研究者引入了粒子群优化（PSO）算法来进行特征选择。PSO是一种全局优化算法，能有效地在大量可能的特征组合中找到最优子集，从而降低恶意脚本的规避可能性。四、实验与结果通过实验验证，该方法不仅增强了分类器的抗规避能力，而且在保持或提高检测准确性的同时，降低了误报和漏报的可能性。实验结果表明，采用粒子群优化的特征选择方法，分类器在面对规避攻击时的性能显著提升。五、关键词机器学习；JavaScript恶意软件检测；规避攻击；特征选择；粒子群优化六、讨论与未来工作尽管该方法在对抗规避攻击方面表现出色，但仍需考虑其他可能的攻击手段和优化方法。未来的研究可以进一步探索如何结合多种优化算法，或者集成不同的检测模型，以构建更加鲁棒和适应性强的JavaScript恶意软件防御体系。这篇论文为提升JavaScript恶意软件检测系统的安全性提供了一个创新的视角，通过优化特征选择过程，提高了分类器对复杂和动态的恶意代码规避策略的抵抗力。同时，这也为机器学习在网络安全领域的应用提供了新的思路和实践案例。

Improving JavaScript Malware Classiﬁer’s Security against Evasion

by Particle Swarm Optimization

Zibo Yi, Jun Ma, Lei Luo, Jie Yu, Qingbo Wu

College of Computer, National University of Defense Technology

Changsha, Hunan, China 410073

{ziboyi, majun, luolei, jackyu, wuqingbo}@ubuntukylin.com

Abstract—Machine learning techniques have recently been

applied to JavaScript malware detection. However, the detec-

tion can be misled since a malicious script may be modiﬁed by

an adversary then masqueraded as a benign one. In this paper,

we investigate how these evasion attacks work and propose a

metric to measure the classiﬁer’s security against it. To improve

the security, we propose a feature selection approach using

particle swarm optimization. The experiments validate that our

approach can strengthen classiﬁer’s security with its accuracy

also increases.

Keywords-machine learning; JavaScript malware detection;

evasion attack; feature selection; particle swarm optimization;

I. INTRODUCTION

JavaScript has been widely used in web applications in the

last several years. Meanwhile, more and more vulnerabilities

in certain web applications were exploited. For example,

driven-by-download attack [1] can be implemented by using

ActiveX to download and run malicious binary ﬁles. Other

serious attacks caused by JavaScript include XSS attack [2],

heap-spraying attack [3], clickjacking [4], etc. Furthermore,

JavaScript can be contained in PDF ﬁles [5], which can

inject shellcode to target system. Therefore, the detection

of JavaScript malware is often needed during browsing the

web or opening a script contained ﬁle.

Some detection methods based on machine learning have

been applied to malicious JavaScript classiﬁcation [6]–[9],

including malicious JavaScript contained PDF classiﬁcation

[5], [10], [11]. Most of the above approaches achieve a very

high accuracy. Especially, the accuracy of JSDC [9] is more

than 99.9%. These approaches use representative features

such as abstract syntax tree, function call patterns to train

a classiﬁer for further classiﬁcation. These approaches are

similar to pattern recognition, which can obtain satisﬁed

effects if proper features and machine learning algorithms

are selected.

However an adversary would modify his malicious script

with a certain evasion attack to prevent his script being

detected by the classiﬁer. Cao et al. [12] validated that an

adversary can evade the classiﬁer by polluting samples at

detection stage. The polluting procedure is to inject benign

features into malicious samples, after which the accuracy

of the classiﬁer has fallen from 93.1% to 36.7%. Rndic

et al. [13] proposed two evasion attacks and apply them

to evade a real PDF malware detector. Indeed, most exist-

ing JavaScript malware classiﬁers have a great probability

of wrong classiﬁcation under general evasion attacks, e.g.

ACRE [14], mimicry attack [15], gradient-descent evasion

attack [16], etc. Therefore, it is necessary to consider the

evasion problem in JavaScript malware classiﬁer design.

Some approaches have been proposed to improve classi-

ﬁer’s ability against evasion. Kołcz et al. [17] proposed that

averaging the features’ weight can improve the robustness.

The explanation is that the modiﬁcation of a feature with

high weight has more impact on classiﬁcation result so that

the adversary can take advantage of those features. Weight

averaging can avoid this and make a robust classiﬁer. But

weight averaging is inefﬁcient for a JavaScript malware

classiﬁer. Such classiﬁers often use abstract syntax tree

(AST) nodes as its features, resulting very high dimension

feature vector. A high dimension vector’s re-weight cost

relatively long time. Biggio et al. [18] proposed that training

multiple classiﬁers and using them together for detection.

Zhang et al. [19] proposed a feature selection method based

on the consideration that some features have little effect on

classiﬁer’s accuracy or robustness. Compared with multiple

classiﬁer system, which tests script several times to classify,

feature selection has lower feature vector dimension and less

test times. This makes feature selection more efﬁcient than

multiple classiﬁer system. For the reasons above we choose

the feature selection approach against adversary’s evasion.

In this paper, we propose a feature selection approach

to make the classiﬁer stronger against evasion. Before we

present our feature selection algorithm a metric would be

proposed to evaluate which selection scheme is better. The

feature selection problem is to ﬁnd the best selection scheme.

We use BPSO (Binary version of Particle Swarm Optimiza-

tion [20]) to solve this problem. By using the proposed

metric and BPSO, we achieve a classiﬁer which is faster to

train and harder to be evaded than [19], with the accuracy

improved at the same time. In summary, this work makes

the following contributions:

• We propose a metric to measure JavaScript malware

classiﬁer’s security, namely, the hardness against eva-

sions.

2016 IEEE TrustCom/BigDataSE/ISPA