深度学习模型：故事点估计新方法

需积分: 9 12 浏览量更新于2024-09-08 1 收藏 401KB PDF 举报

"这篇论文提出了一种深度学习模型来估计敏捷项目中的故事点，这是对传统软件项目努力估计研究的扩展。论文提供了一个包含16个开源项目23,313个问题的综合数据集，用于基于故事点的估计。模型结合了长短时记忆网络（LSTM）和循环高速公路网络（RHN），实现端到端训练，无需手动特征工程。实验结果显示，该方法在均方误差和标准化准确度上持续优于常见努力估计基线和其他替代方法。" 这篇论文探讨的是在敏捷开发环境中，如何利用深度学习技术来更准确地预测用户故事或问题所需的努力，即故事点的估计。在传统的软件开发中，有许多关于工作量估算的研究，但针对敏捷项目的估计算法相对较少，尤其是在估算用户故事或问题方面。故事点是敏捷开发中常用的一种度量单位，用来评估实施一个用户故事或解决一个问题所需的工作量。作者首次贡献了一个大型数据集，包含了16个开源项目中的23,313个问题，这为基于故事点的估计提供了丰富的素材。接着，他们提出了一种新的预测模型，该模型结合了两种深度学习架构：LSTM（长短时记忆网络）和RHN（循环高速公路网络）。LSTM是一种处理序列数据的递归神经网络，能够捕获时间序列中的长期依赖关系，而RHN则通过引入正向和反向传输来缓解LSTM中的梯度消失问题，提高了网络的学习能力。这种结合使得模型能够在原始输入数据直接进行训练，无需手动设计特征，简化了模型构建的过程。在实证评估中，该深度学习模型的表现优于传统的努力估计方法，如COCOMO等基线模型，以及另外两种替代方法，在关键指标均方误差(Mean Absolute Error, MAE)和标准化准确度上表现出色。MAE衡量的是预测值与实际值之间的平均偏差，而标准化准确度则是评估预测值与实际值分布的一致性。这些结果表明，提出的深度学习模型在故事点预测上具有较高的准确性和稳定性，为敏捷开发中的工作量估计提供了新的工具和方法。此外，该研究还暗示了深度学习在软件工程领域的潜力，特别是在敏捷环境下的工作量预测和管理。这种自动化、数据驱动的方法可以提高团队的生产力，帮助项目经理更准确地规划项目进度，从而优化资源分配，降低项目风险。未来的研究可能进一步探索如何将这种方法应用于更广泛的软件开发场景，以及如何结合其他机器学习技术和领域知识来改进故事点的估计精度。

TABLE I

DESCRIPTIVE STATISTICS OF OUR STORY POINT DATASET

Repo. Project Abb. # issues min SP max SP mean SP median SP mode SP var SP std SP mean TD length LOC

Apache Mesos ME 1,680 1 40 3.09 3 3 5.87 2.42 181.12 247,542

Usergrid UG 482 1 8 2.85 3 3 1.97 1.40 108.60 639,110

Appcelerator Appcelerator Studio AS 2,919 1 40 5.64 5 5 11.07 3.33 124.61 2,941,856

Aptana Studio AP 829 1 40 8.02 8 8 35.46 5.95 124.61 6,536,521

Titanium SDK/CLI TI 2,251 1 34 6.32 5 5 25.97 5.10 205.90 882,986

DuraSpace DuraCloud DC 666 1 16 2.13 1 1 4.12 2.03 70.91 88,978

Atlassian Bamboo BB 521 1 20 2.42 2 1 4.60 2.14 133.28 6,230,465

Clover CV 384 1 40 4.59 2 1 42.95 6.55 124.48 890,020

JIRA Software JI 352 1 20 4.43 3 5 12.35 3.51 114.57 7,070,022

Moodle Moodle MD 1,166 1 100 15.54 8 5 468.53 21.65 88.86 2,976,645

Lsstcorp Data Management DM 4,667 1 100 9.57 4 1 275.71 16.61 69.41 125,651

Mulesoft Mule MU 889 1 21 5.08 5 5 12.24 3.50 81.16 589,212

Mule Studio MS 732 1 34 6.40 5 5 29.01 5.39 70.99 16,140,452

Spring Spring XD XD 3,526 1 40 3.70 3 1 10.42 3.23 78.47 107,916

Talendforge Talend Data Quality TD 1,381 1 40 5.92 5 8 26.96 5.19 104.86 1,753,463

Talend ESB TE 868 1 13 2.16 2 1 2.24 1.50 128.97 18,571,052

Total 23,313

SP: story points, TD length: the number of words in the title and description of an issue, LOC: line of code

(+: LOC obtained from www.openhub.net, *: LOC from GitHub, and #: LOC from the reverse engineering)

developing a single issue). Thus, we needed to build such

a dataset for our study. We have made this dataset publicly

available, both to enable veriﬁability of our results and also

as a service to the research community.

To collect data for our dataset, we looked for issues that

were estimated with story points. JIRA is one of the few

widely-used issue tracking systems that support agile devel-

opment (and thus story point estimation) with its JIRA Agile

plugin. Hence, we selected a diverse collection of nine major

open source repositories that use the JIRA issue tracking

system: Apache, Appcelerator, DuraSpace, Atlassian, Moodle,

Lsstcorp, MuleSoft, Spring, and Talendforge. Apache hosts a

family of related projects sponsored by the Apache Software

Foundation [25]. Appcelerator hosts a number of open source

projects that focus on mobile application development [26].

DuraSpace contains digital asset management projects [27].

The Atlassian repository has a number of projects which

provide project management systems and collaboration tools

[28]. Moodle is an e-learning platform that allows everyone

to join the community in several roles such as user, developer,

tester, and QA [29]. Lsstcorp has a number of projects

supporting research involving the Large Synoptic Survey Tele-

scope [30]. MuleSoft provides software development tools and

platform collaboration tools such as Mule Studio [31]. Spring

has a number of projects supporting application development

frameworks [32]. Talendforge is the open source integration

software provider for data management solutions such as data

integration and master data management [33].

We then used the Representational State Transfer (REST)

API provided by JIRA to query and collected those issue

reports. We collected all the issues which were assigned a story

point measure from the nine open source repositories up until

August 8, 2016. We then extracted the story point, title and

description from the collected issue reports. Each repository

contains a number of projects, and we chose to include in our

dataset only projects that had more than 300 issues with story

points. Issues that were assigned a story point of zero (e.g.,

a non-reproducible bug), as well as issues with a negative, or

unrealistically large story point (e.g. greater than 100) were

ﬁltered out. Ultimately, about 2.66% of the collected issues

were ﬁltered out in this fashion. In total, our dataset has 23,313

issues with story points from 16 different projects: Apache

Mesos (ME), Apache Usergrid (UG), Appcelerator Studio

(AS), Aptana Studio (AP), Titanum SDK/CLI (TI), DuraCloud

(DC), Bamboo (BB), Clover (CV), JIRA Software (JI), Moo-

dle (MD), Data Management (DM), Mule (MU), Mule Studio

(MS), Spring XD (XD), Talend Data Quality (TD), and Talend

ESB (TE). Table I summarizes the descriptive statistics of

all the projects in terms of the minimum, maximum, mean,

median, mode, variance, and standard deviations of story

points assigned used and the average length of the title and de-

scription of issues in each project. These sixteen projects bring

diversity to our dataset in terms of both application domains

and project’s characteristics. Speciﬁcally, they are different in

the following aspects: number of observation (from 352 to

4,667 issues), technical characteristics (different programming

languages and different application domains), sizes (from 88

KLOC to 18 millions LOC), and team characteristics (different

team structures and participants from different regions).

IV. APPROACH

Our overall research goal is to build a prediction system that

takes as input the title and description of an issue and produces

a story-point estimate for the issue. Title and description are

required information for any issue tracking system. Although

some issue tracking systems (e.g. JIRA) may elicit addition

metadata for an issue (e.g. priority, type, affect versions,

and ﬁx versions), this information is not always provided

at the time that an issues is created. We therefore make a

pessimistic assumption here and rely only on the issue’s title

and description. Thus, our prediction system can be used at

any time, even when an issue has just been created.

We combine the title and description of an issue report

into a single text document where the title is followed by

the description. Our approach computes vector representations

剩余11页未读，继续阅读

保护我方发际线

粉丝: 0

深度学习模型：故事点估计新方法

"GMM_广义矩估计速成手册：Stata 11中的应用示例

ICCV2017: 计算机视觉顶级会议论文集

ICCV2013精选论文摘要：图像处理与计算机视觉技术

A Simplified Approach for Estimating Secondary Production.pdf

Method for Estimating the Number of Concurrent Users.pdf )

Generative Modeling by Estimating Gradients of the.pdf

A model for estimating the rate constant between CO2-CO gas and molten slag containing iron oxides using optical basicity (2012年)

[1999][421]Estimating multiple classification latent class models.pdf

A design aid for determining width of filter strips.pdf

Estimating_Software_Development_Effort_based_on_Use_Cases.pdf

最新资源