FECS：面向噪声软件故障预测的聚类特征选择方法

39 浏览量更新于2024-08-26 收藏 478KB PDF 举报

本文介绍了一种名为FECS（FEature Clustering with Selection）的研究论文，针对软件故障预测中常见的噪声问题提出了一种新颖的方法。在软件开发过程中，数据噪声是不可避免的，它可能来源于代码中的错误、不完整的数据或者异常行为。传统的特征选择方法往往在处理噪声数据时效果不佳，FECS正是为了弥补这一空白而设计。 FECS方法主要分为两个阶段：特征聚类阶段和特征选择阶段。在特征聚类阶段，该方法首先对原始特征进行分组，通过聚类算法识别出那些可能相关或相似的特征。这样做的目的是降低噪声对特征选择的影响，使得后续的分析更为精确。聚类过程可以利用诸如K-means、DBSCAN或层次聚类等算法，根据特征之间的相似性进行划分。接下来，在特征选择阶段，FECS采用三种不同的启发式搜索策略来进一步优化特征子集。这些策略包括但不限于贪婪搜索、遗传算法和模拟退火，它们各自有其优势，能够从多个角度评估特征的重要性，并在噪声环境下找到最优的特征组合。这种方法旨在提高模型对噪声的鲁棒性，确保在含有噪声的软件数据集中依然能有效地预测软件故障。为了验证FECS的有效性，研究者选择了实际的软件项目，如Eclipse和NASA的代码库，人为地注入了类级和特征级别的噪声，模拟真实世界的数据挑战。实验结果表明，FECS相较于传统方法在噪声环境中具有更好的性能，能够更准确地识别出与软件故障相关的关键特征，从而提高软件故障预测的准确性和可靠性。总结来说，FECS作为一种基于聚类的特征选择方法，通过结合特征聚类和多策略的特征选择，为软件故障预测在存在噪声的情况下提供了有效的解决方案。这对于提升软件质量管理和维护具有重要意义，也为后续的噪声容忍度研究提供了新的思路和技术参考。

FECS: a Cluster based Feature Selection Method

for Software Fault Prediction with Noises

Wangshu Liu

†

, Shulong Liu

†

, Qing Gu

†∗

, Xiang Chen

‡

, Daoxu Chen

†

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

Email: liuws0707@gmail.com

‡

School of Computer Science and Technology, Nantong University, Nantong, China

Email: xchencs@ntu.edu.cn

Abstract—Noises are inevitable when mining software archives

for software fault prediction. Although some researchers have

investigated the noise tolerance of existing feature selection

methods, few studies focus on proposing new feature selection

methods with a certain noise tolerance. To solve this issue,

we propose a novel method FECS (FEature Clustering with

Selection strategies). This method includes two phases: a

feature clustering phase and a feature selection phase with three

different heuristic search strategies. During empirical studies, we

choose real-world software projects, such as Eclipse and NASA

and inject class level and feature level noises simultaneously

to imitate noisy datasets. After using classical feature selection

methods as the baseline, we conﬁrm the effectiveness of FECS

and provide a guideline of using FECS after analyzing the

effects of varying either the percentage of selected features or

the noise rate.

Keywords—Software Quality Assurance, Software Fault Pre-

diction, Feature Selection, Classiﬁcation Model, Noise Tolerance

I. INTRODUCTION

Construction of an effective software fault prediction (SFP)

model depends on the high quality datasets by mining software

archives, such as software conﬁguration management and

bug tracking systems. After extracting software modules, re-

searchers have designed different code or process metrics (i.e.,

features) to measure these modules [1]. However, irrelevant

or redundant features can reduce the accuracy of the fault

prediction model. Previous studies have shown that feature

selection can improve the performance of the models in SFP

[2]–[5]. In previous work [5], we proposed a novel feature

selection method FECAR, which can effectively eliminate

both redundant and irrelevant features. However, noises are

inevitable when mining software archives [6], [7]. Although

some researchers have investigated the noise tolerance of

existing feature selection methods [8], to the best of our

knowledge, few researchers have proposed their own robust

feature selection methods with a certain noise tolerance ability.

Based on our previous work [5], we propose a robust

method FECS (FEature Clustering with Selection strategies),

to resist the inevitable noises in software datasets. FECS

includes two phases: a feature clustering phase for clustering

strongly correlated features and a feature selection phase



Correspondence author. Email: guq@nju.edu.cn

for selecting beneﬁcial features. The main extension on our

previous work [5] is on the feature selection phase. In par-

ticular, we design three different heuristic search strategies

to select the most appropriate feature from each cluster. To

investigate the noise tolerance of FECS, we choose real-

world software projects, including Eclipse and NASA, as our

experimental subjects. We perform a set of data preprocessing

steps to guarantee the datasets noisy free. Then we inject

class level and feature level noises simultaneously to imitate

noisy datasets. After comparing our method FECS with other

classical methods, such as IG, CFS, and Consist on noise free

and noisy datasets respectively, we show the competitiveness

of our approach.

The main contribution of this paper can be highlighted as

follows:

• We propose a novel feature selection method FECS with

a certain noise tolerance for SFP.

• We perform thorough empirical studies based on real

software projects to verify the robustness of the method

FECS on both noise free and noisy datasets and provide

a guideline of using our method.

II. R

ELATED WORK

Nowadays software fault prediction is a hot research issue

[9] in software engineering data mining. By mining software

archives, researchers can extract modules and assign them

corresponding class (faulty or non-faulty). Then they use

different code metrics or process metrics [1] to measure these

modules. Finally they can use these constructed datasets to

build the fault prediction model. Based on this prediction

model, they can categorize new modules into two classes:

fault-prone (FP) or non-fault-prone (NFP).

Feature selection is used to identify and remove irrelevant

and redundant features to solve dimension curse issue in some

datasets. Previous research show the usefulness of feature

selection in SFP [2]–[5], [10]. Meanwhile noises are inevitable

when mining software archives. For example, the process of

linking issue reports with code changes may generate false

negative noises [6], mislabeled issue reports can generate false

positive noises [7]. Kim et al. investigate the noise tolerance

ability of existing fault prediction methods by manually in-

jecting noises [11]. Wald et al. made a comparison between

2015 IEEE 39th Annual International Computers, Software & Applications Conference

DOI 10.1109/COMPSAC.2015.66

276

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38591223

粉丝: 7
资源: 911

FECS：面向噪声软件故障预测的聚类特征选择方法

一种面向软件缺陷预测的可容忍噪声的特征选择框架_刘望舒1

fecs：前端代码样式套件

eclipse-fecs:fecs for eclipse（测试版）

fecs-eclipse:月食

fecs-files:百度前端代码检查工具

fecs-visual-studio-code:Visual Studio代码的样式检查

Sublime-fecsHelper:公司内部代码规范工具fecs的Sublime插件

前端开源库-esformatter-fecs

前端代码风格工具FECS.zip

MW机组北京四方FECS系统介绍PPT学习教案.pptx

最新资源