Malware Detection on Android Smartphones using
Keywords Vector and SVM
Junmei Sun*, Kai Yan, Xuejiao Liu , Chunlei Yang, Yaoyin Fu
Hangzhou Institute of Service Engineering
Hangzhou Normal University
Hangzhou, China
junmeisun@hznu.edu.cn
2015112011003@stu.hznu.cn
liuxuejiao@hznu.edu.cn
1027721710@qq.com
2015112011014@stu.hznu.cn
Abstract—With the development of smart phones, more and
more mobile phone malwares have came out in the market
especially on the popular platforms such as Android, which can
potentially cause harm to users’ information. But how to
effectively detect the new malwares and malicious software
variants has been a difficult problem. In view of the traditional
feature extraction method based on binary program, this paper
presents a method for feature extraction of JAVA source code.
The method uses the Keywords Correlation Distance to compute
the correlation between key codes such as API calls, Android
permissions, the common parameters, and the common key
words in Android malware source code. Then SVM is applied to
make the system gain to accommodate the function of the new
malicious software sample, so as to detect new malicious software
and existing malwares. This method is different from the
conventional methods which are based on the context of the text.
This method combines the characteristics of the malicious
software categories and operating environment to record the
behavior of the malicious software. Experiments show that the
method is efficient and effective in detecting malwares on
Android platform.
Keywords—Android; Malware; Keywords Correlation
Distance; SVM
I. INTRODUCTION
With the advent of the Internet era, the smart phones in the
world is also getting more and more popular, especially the
smart phone with Android operating system with its excellent
performance. However, Android malwares have increased
significantly in recent years. It has been highlighted [4] that
“among all mobile malware, the share of Android based
malware is higher than 46% and still growing rapidly.” Given
the rampant growth of Android malware, there is a pressing
need to effectively mitigate or defend against them[5].
Unfortunately, Most malware detection methods are based on
traditional content signatures, such as a list of malware
signature definitions, and compare each application against the
database of known malware signatures. The disadvantage of
this detection method is that users are only protected from
malware that are detected by most recently updated signatures,
but not protected from new malware[6].There are some
researches proposed detecting malware based on static
requested permissions. The disadvantage of this detection
method is not reliable. This is mainly because developers can
freely request any permission they want, so they can mock the
requested permissions of benign applications. Some researches
dynamically run the App On the sandbox to capture runtime
activities of the App. But analyzing Apps’ runtime dynamic
behaviors requires sophisticated skills and platforms which is
time consuming process and will cause high cost overhead.
Motivated by the above observations, we propose feature
extracted method based on the keywords vector. Every
keywords vector is a set of keywords which can common
complete a malicious attack. We know only some request may
be no harm to users. Harm is often done by a series of
malicious operations.
The contributions of this paper are summarized as follows.
First, we propose a feature extraction method based on
keywords correlation distance which is different from the
traditional method based on binary program.
Second, we use feature vector to describe malicious
software feature including not only permission, APIs, but also
the common parameters and common package etc.
Third, we give a malware detection method through SVM
based on the feature vector set, which can detect new malwares
and malicious software variants.
The rest of this paper is organized as follows: Section II
presents the system framework. Section III gives the definition
of keywords correlation distance. After that, Section IV
proposes the feature extraction method and Section V shows
the detection method using SVM. Section VI gives the
experiment result. Lastly, we proposes the related work in
Section VII. And we summarize our paper in Section VIII.
II. SYSTEM
FRAMEWORK
In this section, we introduce the overall framework of
proposed malware detection scheme. The system framework is
shown in Figure 1. Our system is mainly divided into two
This research is supported in part by the following funds: National Natural Science Foundation of China under grant number 61502134 and Zhejiang Provincial
Science and Technology Innovation Program under grant number 2013TD03. Hangzhou Science and Technology Development Plan (Grant No. 20170533B04)
*Corresponding author: junmeisun@hznu.edu.cn.
978-1-5090-5507-4/17/$31.00 ©2017 IEEE
ICIS 2017, May 24-26, 2017, Wuhan, China
833