高维空间联合特征选择用于Android恶意软件检测

161 浏览量更新于2024-07-14 收藏 1023KB PDF 举报

"这篇研究论文探讨了如何在高维空间中选择矿山联合特征来提升Android恶意软件检测的效率和准确性。作者包括 Yanping Xu、Chunhua Wu、Kangfeng Zheng、Xinxin Niu 和 Tianling Lu，分别来自北京邮电大学网络安全学院和中国公安大学信息技术与网络安全学院。该论文于2016年11月提交，2017年5月修订，6月接受，并于9月30日发表在KSIITRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS期刊上。" 在Android设备中，由于其广泛的应用和快速的增长，存储着大量的敏感隐私信息。因此，对Android恶意应用的检测变得至关重要，以保护用户隐私信息。本文的工作重点在于提取精细粒度的特征，旨在最大化Android恶意软件检测的信息含量。特征选择是机器学习和数据挖掘中的关键步骤，尤其在处理高维数据时，如Android应用的行为和元数据。研究人员提出了一种方法，通过在高维空间中挖掘联合特征，以减少冗余并提高检测性能。联合特征是指多个原始特征相互关联、共同作用形成的新的、更具有区分力的特征。在Android恶意软件检测中，这些特征可能包括应用程序的权限请求模式、API调用序列、网络行为等。高维特征选择的目标是找到最相关的特征子集，以保持模型的预测能力，同时降低计算复杂性和过拟合风险。论文可能涉及以下关键技术点： 1. **特征提取**：通过深入分析Android应用的行为，提取出能够反映恶意行为的关键特征，如异常的系统调用序列、频繁的权限请求、隐藏的网络通信等。 2. **高维特征降维**：使用统计学或机器学习方法（如PCA、LDA或特征选择算法）减少特征数量，降低维度，但保持信息的完整性。 3. **联合特征挖掘**：通过寻找特征之间的相互作用和依赖关系，创建新的联合特征，增强模型的分类性能。 4. **模型构建与评估**：使用适当的分类器（如SVM、决策树或深度学习模型）训练模型，并通过交叉验证和不同性能指标（如精度、召回率、F1分数）来评估模型的效能。该研究对于理解Android恶意软件检测中的特征工程策略，以及在高维数据环境下的特征选择技术有重要意义，有助于开发更高效、更准确的恶意软件检测工具，从而保护用户的隐私和设备安全。

4662 Xu et al.: Feature Selection to Mine Joint Features from High-dimension Space for Android Malware Detection

DroidMat. Lastly, kNN was used to classify the applications as benign or malicious

applications. DroidMat had the better recall rate and high efficiency. Grace et al. [26]

developed RiskRanker to analyze whether a particular application exhibited malicious

behaviors based on control-flow and data-flow. It was helpful to analyze the encrypted native

code and unsafe Dalvik codes. The results showed RiskRanker had the high efficacy and

scalability to detect the zero-day malware. Yang et al. [27] proposed AppContext to construct

a self-defined call graph based on the context of a security-sensitive behavior. Then

AppContext detected the malware by classifying the security sensitive behaviors based on the

extracted contexts.

Permissions and APIs are proved to be useful features to detect malware. So we also extract

them from the decompiled files. Moreover, the extracted features include not only the

Android permissions and Android SDK APIs but also the user-defined permissions and

third-party APIs. In the previous research, user-defined permissions and third-party classes are

never analyzed for malware detection. This is our first difference compared with other papers.

Feature selection is a crucial step in data processing. It chooses a best feature subset from

the whole feature set based on the correlation between features and classes [28]. The selected

features should contain the least features that have great impact on the performance of

malware detection. Experiment results indicate that feature selection is useful for Android

malware detection [5]. IG algorithm is widely used for feature selection based on the entropy

difference [29]. [30] collected 2285 Android applications and extracted more than 9898

features. Then Chi-square (CHI), Fisher Score (FS), and IG methods were used to choose the

top 50, 100, 200, 300, 500 and 800 features. Cen et al. [5] used IG and CHI for feature

selection. They list the top 20 functions selected by IG and plotted a curve to show the

performance of different ratio of the selected feature by IG and CHI. Experiment results show

that IG and CHI were useful methods to select the best feature subset.

In our work, we mainly focus on the feature selection methods,including IG, PSO and

2,1

-norm regularization. They have been successfully applied to solve a large number of

applications and difficult optimization problems [31-33]. [31] applied PSO to find the optimal

feature subset, in which particle swarms found the best feature combinations when they flied

within the subset space. [32] used PSO to accomplish multi-objective feature selection, whose

goals were to maximize the classification performance and to minimize the number of features.

[33] introduced a robust loss function, called Brownboost loss, which computed the feature

quality and selectd the optimal feature subset to enhance robustness. [8] used

1,2

-norm on the

projection matrix to achieve row-sparsity, which led to select the relevant features and learn

the transformation simultaneously. Anyhow, feature selection is a meaningful data processing

technology, which can minimize the classification error rate with the least number of features.

So we use the feature selection to mine the joint features, and maximize the classification

performance.

3. Methodology

The structure and process of Android malware detection based on the feature selection to mine

joint features are depicted in Fig. 1, which consists of four major parts. The first one is reverse

engineering, which decompiles APK files to the readable source code files, including

AndroidManifest.xml and .smali files. The second part is feature extraction, which extracts

features from the source code files. Then each application is represented as a single binary

instance with permission and API features. Class label indicates whether the application is

剩余21页未读，继续阅读

weixin_38501045

粉丝: 5
资源: 963

高维空间联合特征选择用于Android恶意软件检测

基于深度学习的Android恶意软件检测的设计分析.pdf

多维敏感特征的Android恶意应用检测.pdf

基于深度自编码网络的Android恶意软件检测方法.pdf

基于改进哈里斯鹰算法同步优化特征选择的恶意软件检测方法.docx

使用关键字vector和SVM在android智能手机上进行恶意软件检测

基于高维LPQ特征与联合贝叶斯的人脸识别.pdf

使用改进的改组蛙跳算法对优化的高维生物医学数据进行特征选择

MAT格式高维特征选择数据集

系统调用序列特征加权法提升Android恶意软件检测性能

Android恶意软件检测：关键词vector与SVM的应用

最新资源