MaRK：自动化挖掘领域需求知识的方法

PDF格式 | 524KB | 更新于2024-08-26 | 30 浏览量 | 举报

"本文主要探讨了在组织进入新领域时如何高效地获取和利用领域知识，特别是关于需求挖掘的问题。作者提出了MaRK（Mining Requirements Knowledge）系统，该系统旨在自动识别和检索包含功能组件描述的领域文档，以帮助领域分析师快速指定需求。文章中详细介绍了MaRK的工作原理，包括一个基于文档与组件相关性的排序算法，以及高亮显示可能包含需求信息的文本部分。通过在‘积极训练控制’(PTC)领域的523份文档上进行实验，评估了MaRK算法的有效性，特别是在检索与PTC机载单元相关需求的能力上。" 本文的研究集中在解决新领域中的需求挖掘挑战，这是一个在项目初期经常需要重复进行的关键任务。传统方法依赖于人工搜索、筛选和分析大量文档，这既耗时又耗费人力。MaRK系统的引入旨在自动化这个过程，减少工作负担。MaRK的核心在于其算法，该算法能够根据文档内容与领域模型中功能组件的关联程度进行文档排序，从而优先展示最相关的文档。在实际应用中，领域模型通常用于捕捉领域的关键特性，包括功能组件，这些组件是定义需求的基础。MaRK系统通过分析这些文档，找出与模型组件相匹配的部分，帮助分析师快速定位到可能满足需求的信息。这不仅提高了工作效率，也有助于确保需求定义的全面性和准确性。在实验部分，研究人员在PTC领域的一个大型文档集合上测试了MaRK系统。PTC是一个复杂的系统，其需求涵盖多个方面，包括安全、控制和通信等。通过对523份文档的处理，MaRK展示了在检索与PTC机载单元相关需求方面的强大能力。实验结果表明，MaRK能够有效地支持领域分析师的工作，提供有针对性的需求知识，从而加速需求分析和定义的过程。这篇研究论文提出了一种创新的方法，即MaRK，用于从领域文档中挖掘需求知识，它为解决新领域中的需求理解问题提供了一种有效工具。通过自动化和智能化的手段，MaRK减少了人工工作量，提升了需求工程的效率，对于任何进入新领域的组织都具有重要的实践意义。

Mining Requirements Knowledge from

Collections of Domain Documents

Xiaoli Lian

∗

, Mona Rahimi

Jane Cleland-Huang

, Li Zhang

∗

University of Notre Dame, South Bend IN, USA.

∗

Beihang University, Beijing, China.

Email: {lianxiaoli,lily}@buaa.edu.cn,

m.rahimi@acm.org, JaneClelandHuang@nd.edu

Remo Ferrari and Michael Smith

Siemens Industry

Rail Automation, New York, USA

remo.ferrari@siemens.com,

michael-smith@siemens.com

Abstract—When organizations enter domains that are entirely

new to them, they need to invest signiﬁcant time and effort

to acquire domain knowledge. This typically involves searching

through a broad set of domain documents, retrieving relevant

ones, and analyzing the textual content in order to discover

and specify pertinent requirements. Depending on the nature

of the domain and the availability of documentation, this task

can be extremely time-consuming and may require non-trivial

human effort. Furthermore, the task must often be performed

repeatedly throughout early phases of the project. In this paper

we ﬁrst explore the effort needed to manually build a high-

level domain model capturing the functional components. We

then present MaRK (Mining Requirements Knowledge), which

identiﬁes and retrieves the documents containing descriptions of

functional components in the domain model. Domain analysts can

use this information to to specify requirements. We introduce

and evaluate an algorithm which ranks domain documents

according to their relevance to a component and then highlights

sections of text which are likely to contain requirements-related

information. We describe our process within the context of the

Positive Train Control (PTC) domain with a repository of of 523

documents, representing 852MB of data. We empirically evaluate

the MaRK relevance algorithm and its ability to retrieve relevant

requirements knowledge for requirements related to PTC’s On-

Board Unit.

I. INTRODUCTION

When entering an entirely new domain, software and sys-

tems engineers typically engage in a process of knowledge

discovery through an activity referred to as Domain Analysis.

Deﬁned by James Neighbors in the 1980s as the process

of analyzing related software systems in order to identify

their commonalities and variabilities [26], domain analysis

can enable signiﬁcant reuse at requirements, design, and

implementation levels [9]. Sources of domain knowledge

usually include technical literature, existing implementations,

customer surveys, expert advice, requirements speciﬁcations

[3] and online product descriptions [15]. Common techniques

for domain analysis include in-depth reviews of the require-

ments, design, code, and other product artifacts for a relatively

small number of existing systems [32], analysis of a large

numbers of rather shallow online product descriptions [10],

or searching the web to retrieve and analyze a broad set

of publicly available documents describing products in the

domain [28], [6], [25].

The continually expanding availability of accessible docu-

ments for a broad genre of domains, makes web-mining par-

ticularly appealing. However, there are challenges in mining

such documents which tend to be textually-rich, generally

unstructured, and contain highly redundant and sometimes

incomplete descriptions of various system components and

features. Our goal is to leverage such domain documents to

extract a functional domain model describing components,

communication mechanisms, and associated processes. Our in-

dustrial collaborators have stated that in their current practice,

performing this task takes them “enormous amounts of engi-

neering time”. They articulated several goals aimed at reducing

the excessive effort needed to acquire requirements knowledge

from repositories of domain documents. These goals included

generating an overview of the domain, quickly identifying

documents that were relevant to speciﬁc components, and

providing affordances to visually explore relevant parts of the

documents.

In this paper we present our approach, which we refer

to as ‘Mining Requirements Knowledge’ (MaRK). MaRK is

designed to reduce human effort by providing semi-automated

support for engineers tasked with discovering requirements

knowledge. We adopt the deﬁnition of a requirement as “a

statement of what the system must do, how it must behave,

the properties it must exhibit, the qualities it must possess,

and the constraints that the system and its development must

satisfy” [27]. Furthermore, we deﬁne requirements knowledge

as information that is “helpful for answering requirements-

related questions in any phase of a software project [24]”.

Requirements knowledge is therefore diverse in nature, and

can be retrieved from various sources. In this work, the

repository of domain documents from which requirements

knowledge is retrieved, is particularly diverse and includes

architectural documents, functional descriptions, regulations,

and so on. Our approach is designed for use in any domain

for which domain documents describing the major components

of the domain are available.

A. Domain: Positive Train Control

We apply MaRK to a project in the transportation domain,

focusing on Positive Train Control (PTC) [23], [17]. PTC

2016 IEEE 24th International Requirements Engineering Conference

DOI 10.1109/RE.2016.50

156

RE 2016, Beijing, China

Research Paper

下载后可阅读完整内容，剩余9页未读，立即下载

普通网友

粉丝: 8

MaRK：自动化挖掘领域需求知识的方法

Web数据挖掘技术在中国电子商务领域的应用研究综述.pdf

基于WEB的多媒体数据挖掘的研究.pdf

数据挖掘在商业领域的运用.doc

第六章物流数据库技术与数据挖掘8优秀文档.ppt

CorEx主题建模：最小领域知识下的信息挖掘

英文版软件设计文档集合：架构与系统设计模板

BokeCMS 1.5.0：多技术领域源码集合

数据挖掘：知识发现与数据库探索

全技术领域源码集合：出国留学网站模板5370

Java数据流挖掘算法集合详介

最新资源