基于字典学习技术的软件缺陷预测方法研究

174 浏览量更新于2024-08-27 收藏 603KB PDF 举报

基于字典学习的软件缺陷预测 **概述** 软件缺陷预测是软件测试中的一项重要任务，其目的是自动识别有缺陷的软件模块，以提高软件系统的质量。在 recent years，机器学习技术已经被应用于缺陷预测中。基于字典学习的软件缺陷预测是一种有效的方法，它利用了软件模块之间的相似性，将每个软件模块表示为一个小比例的其他模块的组合。 **字典学习技术** 字典学习是一种表示学习技术，它可以学习到一个字典，即一组基函数，每个基函数都是从训练数据中学习到的。字典学习技术可以用于表示软件模块之间的相似性，并学习到一个稀疏的表示系数矩阵。这种技术已经被成功应用于图像处理、自然语言处理等领域。 **软件缺陷预测** 软件缺陷预测是软件测试中的一项关键任务，其目的是自动识别有缺陷的软件模块。传统的软件缺陷预测方法主要基于静态代码特征，但是这些方法存在一些缺陷，例如需要大量的计算资源和人工干预。在 recent years，机器学习技术已经被应用于缺陷预测中，例如支持向量机、随机森林、神经网络等。 **基于字典学习的软件缺陷预测方法** 基于字典学习的软件缺陷预测方法是一种有效的方法，它利用了软件模块之间的相似性，将每个软件模块表示为一个小比例的其他模块的组合。这种方法可以学习到一个稀疏的表示系数矩阵，并且可以自动识别有缺陷的软件模块。 **优点** 基于字典学习的软件缺陷预测方法有以下优点： * 高度自动化：该方法可以自动学习到软件模块之间的相似性，无需人工干预。 * 高度准确性：该方法可以学习到一个稀疏的表示系数矩阵，从而提高缺陷预测的准确性。 * 高度灵活性：该方法可以应用于不同的软件项目和领域。 **结论** 基于字典学习的软件缺陷预测方法是一种有效的方法，它可以自动识别有缺陷的软件模块，并提高软件系统的质量。该方法可以应用于不同的软件项目和领域，並且可以与其他机器学习技术结合使用，以提高缺陷预测的准确性。

Dictionary Learning Based Software Defect Prediction

Xiao-Yuan Jing

1,2*

, Shi Ying

, Zhi-Wu Zhang

1,2

, Shan-Shan Wu

1,2

, Jin Liu

State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, China

College of Automation, Nanjing University of Posts and Telecommunications, Nanjing, China

* Corresponding author: jingxy_2000@126.com

ABSTRACT

In order to improve the quality of a software system, software

defect prediction aims to automatically identify defective

software modules for efficient software test. To predict software

defect, those classification methods with static code attributes

have attracted a great deal of attention. In recent years, machine

learning techniques have been applied to defect prediction. Due

to the fact that there exists the similarity among different

software modules, one software module can be approximately

represented by a small proportion of other modules. And the

representation coefficients over the pre-defined dictionary, which

consists of historical software module data, are generally sparse.

In this paper, we propose to use the dictionary learning technique

to predict software defect. By using the characteristics of the

metrics mined from the open source software, we learn multiple

dictionaries (including defective module and defective-free

module sub-dictionaries and the total dictionary) and sparse

representation coefficients. Moreover, we take the

misclassification cost issue into account because the

misclassification of defective modules generally incurs much

higher risk cost than that of defective-free ones. We thus propose

a cost-sensitive discriminative dictionary learning (CDDL)

approach for software defect classification and prediction. The

widely used datasets from NASA projects are employed as test

data to evaluate the performance of all compared methods.

Experimental results show that CDDL outperforms several

representative state-of-the-art defect prediction methods.

Categories and Subject Descriptors

D.2.9 [Management]: Software quality assurance (SQA), G.1.3

[Numerical Linear Algebra]: Sparse, structured, and very large

systems (direct and iterative methods), I.5.2 [Design

Methodology]: Classifier design and evaluation.

General Terms

Algorithms

Keywords

Software defect prediction, dictionary learning, sparse

representation, cost-sensitive discriminative dictionary learning

(CDDL).

1. INTRODUCTION

Software defect prediction is one of the most important

research topics in software engineering [1-2,57,59], which is an

efficient means to relieve the burden on software code inspection

or testing. To achieve the goal of detecting and correcting the

greatest number of defects in software, software defect prediction

enables the organization’s limited resource to be reasonably

allocated. It can be generally categorized into two types: static

and dynamic defect prediction technology. Static defect prediction

technology mainly refers to defect number prediction or defect

distribution prediction based on the defect-related metrics.

Dynamic defect prediction technology predicts the distribution of

the system defects over time by using the defect generated time.

Static prediction technique has been widely used, because it can

predict the defect proneness of new software modules with the

historical defect data so as to improve the quality of software [3-

4]. The key of static defect prediction technique is how to fully

analyze and utilize the existing historical data, and then build

more precise and effective binary classifiers of software modules.

In recent years, many popular classification methods, such as

support vector machine (SVM) [5-7], decision tree [8-11], neural

networks [12-13], Naïve Bayes [14-17], and cost-sensitive

learning methods [18-22], have been employed to achieve this

goal. However, in the field of software defect prediction, these

classification methods often encounter some difficulties, for

example, the class-imbalance problem [23-25] and the

misclassification cost issue [18]. Class-imbalance problem

indicates that a software system contains much fewer defective

modules than defective-free modules, which leads to negative

influence on decision of classifiers [26-29]. Classifying a

software module as defective-prone implies that more testers

should be invested in the verification activities, thus adding to the

development cost. Misclassifying a module as defective-free

carries the risk of system failure, which is also associated with

cost implications [58].

Sparse representation, a recently developed technique, arouses

much interest from researchers due to its effectiveness and

robustness. The idea of sparse representation is that information

of a signal can be efficiently represented or coded by a linear

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for profit or commercial advantage and that copies bear

this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

ICSE' 14, May 31 - June 7, 2014, Hyderabad, India

http://dx.doi.org/10.1145/2568225.2568320

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full citation

on the ﬁrst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a

fee. Request permissions from Permissions@acm.org.

ICSE’14, May 31 – June 7, 2014, Hyderabad, India

http://dx.doi.org/10.1145/2568225.2568320

414

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38641561

粉丝: 5
资源: 943

基于字典学习技术的软件缺陷预测方法研究

基于二次学习的半监督字典学习软件缺陷预测_张志武1

基于深度学习的软件缺陷预测模型.pdf

基于深度学习的软件缺陷预测模型

半监督字典学习软件缺陷预测：二次学习方法

字典学习提升软件缺陷预测精度

基于迁移学习的软件缺陷预测_程铭1

深度卷积神经网络驱动的软件缺陷预测

软件工程试卷及答案

软件工程 知识点复习大纲

软件工程课程设计的文档（含代码）

最新资源

软件工程知识点复习大纲