软件度量模型及数据缺失处理研究

0 下载量 15 浏览量 更新于2023-11-22 收藏 365KB DOC 举报
Software metric models are built based on measurement values of software engineering projects, such as team size, to predict future software engineering targets, like development work effort and defect rate. It is obvious that constructing such software metric models requires the use of past data samples from similar projects. However, these data samples often suffer from missing data. The selection of which metric variables to use as independent variables in regression modeling is often based on intuition or empirical assumptions. After the model is established, empirical assumptions are rarely tested, which can lead to unnecessary complexity and the inclusion of redundant metric variables. Furthermore, these metric values can be both continuous and discrete variables. How to build a simplified software metric model with data samples containing missing data is the main topic discussed and researched in this paper. The main content of this paper includes the following parts: Chapter 1 is the introduction, which discusses the main research background of the paper and provides a preliminary introduction to the three difficulties encountered and existing solutions (which will be discussed in detail in Chapters 2, 3, and 4). Chapter 2 focuses on statistical processing of missing data. This part first introduces some background knowledge related to missing data and then proposes several methods for dealing with missing data problems. Lastly, the k-NN method and Monte Carlo simulation method adopted in this paper are elaborated in detail. Chapter 3 mainly considers the methods for handling discrete variables. It introduces a commonly used method of dummy variables. Chapter 4 discusses variable selection methods. In this chapter, three traditional variable selection methods are introduced, and after comparison, stepwise regression is selected as the variable selection method for this paper. Chapter 5 is a case analysis. R, SPSS, Java, and other programming languages and programs are used to apply the methods and theories introduced in Chapters 2, 3, and 4 to actual data samples, completing the simplification of the software metric data model under the condition of missing data. Keywords: software metric, variable selection, missing data, stepwise regression, dummy variable method.