1860
IEICE TRANS. INF. & SYST., VOL.E102–D, NO.9 SEPTEMBER 2019
LETTER
TFIDF-FL: Localizing Faults Using Term Frequency-Inverse
Document Frequency and Deep Learning
Zhuo ZHANG
†
, Nonmember, Yan LEI
††a)
, Member, Jianjun XU
†b)
, Xiaoguang MAO
†
,
and Xi CHANG
†
, Nonmembers
SUMMARY Existing fault localization based on neural networks uti-
lize the information of whether a statement is executed or not executed to
identify suspicious statements potentially responsible for a failure. How-
ever, the information just shows the binary execution states of a statement,
and cannot show how important a statement is in executions. Consequently,
it may degrade fault localization effectiveness. To address this issue, this
paper proposes TFIDF-FL by using term frequency-inverse document fre-
quency to identify a high or low degree of the influence of a statement in
an execution. Our empirical results on 8 real-world programs show that
TFIDF-FL significantly improves fault localization effectiveness.
key words: debugging, fault localization, term frequency, inverse docu-
ment frequency, deep learning
1. Introduction
In the process of software development, debugging usually
requires much manual involvement of debugging engineers.
Researchers have developed many fault localization tech-
niques to reduce the cost of debugging [1]. In recent years,
deep learning has witnessed a rapid development and shows
its promising ability of providing tremendous improvement
in robustness and accuracy [2].
Thus, some researchers have preliminarily used deep
neural networks with multiple hidden layers to discuss and
evaluate the potential of deep learning in fault localiza-
tion [3], [4]. They found that with the capability of esti-
mating complicated functions by learning a deep nonlinear
network’s structure and attaining distributed representation
of input data, deep neural networks exhibit strong learning
ability from sample data sets. However, the existing analysis
is still preliminary and needs much further study. For exam-
ple, it utilizes a matrix as the training samples, among which
the value of each element is either 1 meaning a statement
is executed or 0 denoting a statement is not executed.We
can observe that the binary information of a statement just
whether a statement is executed or not, whereas it cannot
show what degree of the influence of a statement in an exe-
cution. The existing analysis also uses small-sized programs
(i.e. hundreds of lines of code) with all seeded faults. The
Manuscript received November 14, 2018.
Manuscript revised March 19, 2019.
Manuscript publicized May 27, 2109.
†
The authors are with College of Computer, National Univer-
sity of Defense Technology, Changsha 410073, China.
††
The author is with School of Big Data & Software Engineer-
ing, Chongqing University, Chongqing 400044, China.
a) E-mail: yanlei@cqu.edu.cn (Corresponding author)
b) E-mail: jianjun.xu@yeah.net (Corresponding author)
DOI: 10.1587/transinf.2018EDL8237
recent research [5] has revealed that small-sized programs
with artificial faults are not useful for predicting which fault
localization techniques perform best on real faults. Further-
more, the previous research [6] has shown there are unique
features in test cases related to faults, e.g. the execution fre-
quency of each statement. However, the current approaches
use this feature of each statement in just one test case, and
do not consider their features from the view of all test cases.
Consequently, it may cause some bias, posing a negative ef-
fect on fault localization effectiveness [7].
Therefore, this paper explores more about deep learn-
ing in improving fault l ocalization, i.e., we aim at obtain-
ing more insights by proposing an approach to identify the
impact of each statement in all test cases by using the fea-
tures from the view of all test cases, rather than a binary
status, and evaluating our results with large-scale programs.
Specifically, we propose TFIDF-FL: an effective fault lo-
calization approach using term frequency-inverse document
frequency (TF-IDF) [8] to reflect how important of a state-
ment in the executions of a test suite. TFIDF-FL abstracts a
statement as a word and uses TF-IDF to construct a matrix
as the training samples, which reflect how important a word
(i.e. a statement) in the executions of a test suite. Then, it
uses the architecture of Muti-Layer Perceptrons (MLPs) to
learn a model from the training samples. Finally, TFIDF-
FL evaluates the suspiciousness of each statement of being
faulty by testing the trained model using a virtual test set.
We designed and performed an empirical study on 8 large
real-world programs. The results show that TFIDF-FL can
significantly improves fault localization effectiveness.
2. Approach
2.1 Overview
In information retrieval, TF-IDF is a numerical statistic that
is intended to reflect how i mportant a word is to a docu-
ment in a collection. It is one of the most popular term-
weighting schemes and is often used in searches of infor-
mation retrieval, text mining, and user modeling [8].The
TF-IDF is the product of two statistics, TF means term fre-
quency and IDF means inverse document frequency. The
term frequency is the number of times a word occurs in a
document while inverse document frequency is whether a
word is common or rare across all documents. The t erm fre-
quency of a word is low if it occurs few times in a document,
Copyright
c
2019 The Institute of Electronics, Information and Communication Engineers