Improving JavaScript Malware Classifier’s Security against Evasion
by Particle Swarm Optimization
Zibo Yi, Jun Ma, Lei Luo, Jie Yu, Qingbo Wu
College of Computer, National University of Defense Technology
Changsha, Hunan, China 410073
{ziboyi, majun, luolei, jackyu, wuqingbo}@ubuntukylin.com
Abstract—Machine learning techniques have recently been
applied to JavaScript malware detection. However, the detec-
tion can be misled since a malicious script may be modified by
an adversary then masqueraded as a benign one. In this paper,
we investigate how these evasion attacks work and propose a
metric to measure the classifier’s security against it. To improve
the security, we propose a feature selection approach using
particle swarm optimization. The experiments validate that our
approach can strengthen classifier’s security with its accuracy
also increases.
Keywords-machine learning; JavaScript malware detection;
evasion attack; feature selection; particle swarm optimization;
I. INTRODUCTION
JavaScript has been widely used in web applications in the
last several years. Meanwhile, more and more vulnerabilities
in certain web applications were exploited. For example,
driven-by-download attack [1] can be implemented by using
ActiveX to download and run malicious binary files. Other
serious attacks caused by JavaScript include XSS attack [2],
heap-spraying attack [3], clickjacking [4], etc. Furthermore,
JavaScript can be contained in PDF files [5], which can
inject shellcode to target system. Therefore, the detection
of JavaScript malware is often needed during browsing the
web or opening a script contained file.
Some detection methods based on machine learning have
been applied to malicious JavaScript classification [6]–[9],
including malicious JavaScript contained PDF classification
[5], [10], [11]. Most of the above approaches achieve a very
high accuracy. Especially, the accuracy of JSDC [9] is more
than 99.9%. These approaches use representative features
such as abstract syntax tree, function call patterns to train
a classifier for further classification. These approaches are
similar to pattern recognition, which can obtain satisfied
effects if proper features and machine learning algorithms
are selected.
However an adversary would modify his malicious script
with a certain evasion attack to prevent his script being
detected by the classifier. Cao et al. [12] validated that an
adversary can evade the classifier by polluting samples at
detection stage. The polluting procedure is to inject benign
features into malicious samples, after which the accuracy
of the classifier has fallen from 93.1% to 36.7%. Rndic
et al. [13] proposed two evasion attacks and apply them
to evade a real PDF malware detector. Indeed, most exist-
ing JavaScript malware classifiers have a great probability
of wrong classification under general evasion attacks, e.g.
ACRE [14], mimicry attack [15], gradient-descent evasion
attack [16], etc. Therefore, it is necessary to consider the
evasion problem in JavaScript malware classifier design.
Some approaches have been proposed to improve classi-
fier’s ability against evasion. Kołcz et al. [17] proposed that
averaging the features’ weight can improve the robustness.
The explanation is that the modification of a feature with
high weight has more impact on classification result so that
the adversary can take advantage of those features. Weight
averaging can avoid this and make a robust classifier. But
weight averaging is inefficient for a JavaScript malware
classifier. Such classifiers often use abstract syntax tree
(AST) nodes as its features, resulting very high dimension
feature vector. A high dimension vector’s re-weight cost
relatively long time. Biggio et al. [18] proposed that training
multiple classifiers and using them together for detection.
Zhang et al. [19] proposed a feature selection method based
on the consideration that some features have little effect on
classifier’s accuracy or robustness. Compared with multiple
classifier system, which tests script several times to classify,
feature selection has lower feature vector dimension and less
test times. This makes feature selection more efficient than
multiple classifier system. For the reasons above we choose
the feature selection approach against adversary’s evasion.
In this paper, we propose a feature selection approach
to make the classifier stronger against evasion. Before we
present our feature selection algorithm a metric would be
proposed to evaluate which selection scheme is better. The
feature selection problem is to find the best selection scheme.
We use BPSO (Binary version of Particle Swarm Optimiza-
tion [20]) to solve this problem. By using the proposed
metric and BPSO, we achieve a classifier which is faster to
train and harder to be evaded than [19], with the accuracy
improved at the same time. In summary, this work makes
the following contributions:
• We propose a metric to measure JavaScript malware
classifier’s security, namely, the hardness against eva-
sions.
2016 IEEE TrustCom/BigDataSE/ISPA
2324-9013/16 $31.00 © 2016 IEEE
DOI 10.1109/TrustCom/BigDataSE/ISPA.2016.264
1735
2016 IEEE TrustCom/BigDataSE/ISPA
2324-9013/16 $31.00 © 2016 IEEE
DOI 10.1109/TrustCom/BigDataSE/ISPA.2016.264
1735
2016 IEEE TrustCom/BigDataSE/ISPA
2324-9013/16 $31.00 © 2016 IEEE
DOI 10.1109/TrustCom/BigDataSE/ISPA.2016.264
1735
2016 IEEE TrustCom/BigDataSE/ISPA
2324-9013/16 $31.00 © 2016 IEEE
DOI 10.1109/TrustCom/BigDataSE/ISPA.2016.264
1735
2016 IEEE TrustCom/BigDataSE/ISPA
2324-9013/16 $31.00 © 2016 IEEE
DOI 10.1109/TrustCom/BigDataSE/ISPA.2016.264
1734