深度学习应用中的TensorFlow程序错误实证研究

59 浏览量更新于2024-08-27 收藏 1.37MB PDF 举报

本文是一篇名为《TensorFlow程序错误的实证研究》的研究论文，作者是来自北京大学和香港科技大学的学者，包括张宇豪、陈一帆、郑胜驰、熊颖飞和张路。随着深度学习在自动驾驶系统和面部识别等关键领域中的广泛应用，确保这些应用的正确性至关重要。然而，尽管近年来针对深度学习应用的测试与调试方法有所发展，但深度学习错误的独特特性尚未得到充分理解。该研究的核心目标是通过实证分析来揭示TensorFlow程序中的错误模式和常见问题。TensorFlow作为一个广泛使用的深度学习框架，其程序bug的研究有助于开发者优化开发过程，减少潜在的安全风险和性能问题。论文可能探讨了以下内容： 1. **深度学习应用的特点**：深度学习模型的复杂性可能导致错误难以检测和修复，如隐藏层的非线性交互、大量的参数和依赖于数据的训练过程。 2. **bug类型与分布**：研究可能分析了不同类型的TF程序错误，比如权重初始化错误、数据处理错误、模型架构错误或超参数设置不当等，并提供了他们在实际项目中的分布情况。 3. **错误检测与诊断**：论文可能会评估现有的错误检测工具和技术在深度学习代码中的有效性，以及它们如何应对深度学习特有的挑战。 4. **调试策略**：论文可能提出了针对深度学习bug的特殊调试方法，例如利用梯度检查、模型可视化或者利用元学习来辅助诊断。 5. **案例研究与教训**：通过对具体案例的深入剖析，论文可能揭示了一些常见的错误陷阱和开发者应该避免的实践误区。 6. **未来方向**：论文可能还讨论了深度学习程序bug研究的未来趋势，包括如何结合更先进的自动化工具和人工智能技术来提升缺陷检测和修复的效率。这篇论文对深度学习领域的开发者具有重要的参考价值，它不仅提供了关于TensorFlow程序bug的现状分析，也为改进深度学习软件质量管理和开发实践提供了实用的见解。

An Empirical Study on TensorFlow Program Bugs ISSTA’18, July 16–21, 2018, Amsterdam, Netherlands

(Lines 2-14). Second, a session object is created to launch the con-

structed computation graph and build a neural network. The execu-

tion phase can be further divided into two sub-phases: training and

testing. In this training phase (Lines 16-21), a set of labeled samples

are used to train the neural network, minimizing the model loss

by means of cross entropy. A gradient descent algorithm is often

deployed to carry out the minimization. In the training phase, the

network will be trained for numerous iterations. After a model is

trained, in the testing phase, it can be applied to classify samples

in a dataset (Line 22).

3 RESEARCH QUESTIONS

Our study aims to answer the following three research questions.

• RQ1: What are the symptoms and root causes of the bugs?

•

RQ2: What new challenges exist to detect the bugs and how

do TF users handle them?

•

RQ3: What new challenges exist to localize the bugs and

how do TF users handle them?

The rst research question concerns the characteristics of the bugs.

The symptoms help us understand the consequences of the bugs

and are useful in designing detection method. The root causes help

us understand the nature of the bugs and the connections between

root causes and symptoms are useful in designing fault localization

methods. The second and third research questions concern the

new challenges imposed by the paradigm shift from traditional

program to TF programs, with an emphasis on fault detection and

localization. When answering these questions about challenges, we

are also concerned about the solutions currently used by TF users.

Understanding these solutions helps the development of new fault

detection and localization techniques.

4 DATA COLLECTION

We collected TensorFlow bugs from two sources: StackOverow

pages and GitHub commits. StackOverow pages contain bugs that

might be dicult to debug: at least the TF user could not resolve

the bug quickly and has to ask a question for assistance. On the

other hand, GitHub commits contain bugs that might be dicult to

detect: at least the TF user did not discover it at the rst place and

committed into the project. Putting the two sources together, we

have a dataset of interest: the bugs those cause problems to the TF

users and those are worth studying.

To collect bugs from StackOverow pages, we used a search term

“tensorow answers:1 -how -install -build” in StackOverow’s search

engine. The parameter “answers:1” ensures that only questions with

at least one answer were considered. And other parameters “-how

-install -build” were used to lter out discussions about installment

and building of TensorFlow which we do not concern about. Then

we manually reviewed top 500 question returned by StackOverow

and found 87 questions related to TensorFlow application bugs.

Please note that StackOverow may contain both novices’ and

experts’ posts, and we believe both are important and should be

included in the study. The statistics of the QA pages can be found

in Table 1.

To collect bugs from GitHub commits, we searched for projects

with keyword “tensorow ” in GitHub’s search engine. Among the

search results, we selected 11 target projects that are well-maintained

with the highest numbers of commits and stars for further exam-

ination. The statistics of these projects are shown in Table 2. We

take into consideration commits between start date and end date to

collect bugs in each project. Then we searched commit messages

with keywords “bug, x, wrong, error, nan, inf, issue, fault, fail, crash”

in each project. In addition, we ltered out “typo” and merged pull

requests to eliminate irrelevant and duplicate commits. We man-

ually inspected the source code, commit messages, pull request

messages, and issue messages to identify coding bugs. As a result,

we found 82 commits which contain 88 bugs related to TensorFlow

application bugs on GitHub. For each commit, we read the commit

and pull request message to see if there were any associated issues,

and included the discussion thread of the issue into consideration.

The subjects were collected between July 2017 and May 2018.

We have calculated the time spending from posting the issues until

its resolving on Github issues and StackOverow QA pages. In

Github issues, the mean is 27,845 minutes and the median is 5,122

minutes. In StackOverow QA pages, the mean is 33,312 minutes

and the median is 177 minutes. When manual inspections are

involved, two authors performed the inspection separately and

discussed inconsistent issues until agreement. During the process,

one StackOverow bug and eight GitHub bugs identied by one

author were removed from the discussion.

Putting together, we got a dataset

of 175 bugs, including 87 col-

lected from StackOverow and 88 collected from GitHub. The scale

of our dataset is similar to other existing studies that require manual

inspection, e.g., Jin et al. conducted a study of performance bugs

and inspected 109 performance bugs [

], and Nasehi et al. con-

ducted a study on what makes a good code example and analyzed

163 StackOverow QA pages [26].

5 RQ1: SYMPTOMS AND ROOT CAUSES

5.1 Information Sources for Analysis

To answer the rst research question, we analyzed each bug in

our dataset to identify its root causes and symptoms. For GitHub

bugs, the root causes can be identied by the changes made in

the commits. We identied the symptoms of bugs by reading the

commit message, pull request messages and the associated issues.

For StackOverow bugs, we learnt the root causes of bugs by read-

ing the answers that provide a solution. We identied these bugs’

symptoms from the question description. Besides, we also tried

to reproduce the bugs to further understand their symptoms. We

were able to reproduce 75 out of 88 Github bugs and 76 out of 87

StackOverow bugs. The rest of the bugs were not reproducible be-

cause of dead links, missing datasets, or the requirement of specic

hardware. We summarized the common root causes and symptoms

of collected bugs into major categories and classied each bug

accordingly. Two authors performed classication separately, no

disagreement was found on StackOverow bugs and ve Github

bugs were classied dierently.

5.2 Results

The statistics of the symptoms (rows) and root causes (columns)

that we found from our analysis are given in Table 3. We identied

Our dataset is available at https://github.com/ForeverZyh/TensorFlow-Program-B

ugs.

剩余11页未读，继续阅读

Cisco789

粉丝: 10
资源: 930

深度学习应用中的TensorFlow程序错误实证研究

IE的不合理设计和Bugs

关于微信小程序的外文文献

房屋租赁平台外文参考文献

最新英文java mysql vue文献

网上售房管理系统相关外国文献

请概括一下这篇文章的重点：Consumption and Liquidity Constraints: An Empirical Investigation

springboot经济可行性外文文献

与SSM框架有关的外文文献

帮我找一篇英文文献，进几年的有关Java 框架的

javascript近五年相关外文文献

最新资源