R2D2:一种用于检测冗余代码的高效工具

需积分: 2 16 浏览量更新于2024-07-24 收藏 251KB PDF 举报

"R2D2:一种用于检测冗余代码的智能工具" 在软件质量研究领域的一项重要工作《Detection of Redundant Code Using R2D2》发表于《Software Quality Journal》的2004年第12期，361-382页。该论文由António Menezes Leitão撰写，探讨了在大型Lisp编写软件系统中识别和消除冗余代码的重要性。冗余代码不仅增加了维护难度，还可能导致潜在的错误和性能浪费，因此有效地检测并消除冗余是提高软件质量和开发效率的关键。 R2D2（Redundancy Detector for Deduplication）的设计目标是通过结合多种技术和策略，提供一种高效且准确的冗余代码检测方法。它采用了一种综合的分析模型，包括语法分析和语义分析技术，对每对代码片段进行比较。语法分析关注代码的结构相似性，而语义分析则深入到代码的功能行为，以识别那些在功能上完全或部分重复的部分。 R2D2能够区分出正向证据（代码片段具有相同的功能）和负向证据（虽然看起来相似，但实际功能不同）。这些证据根据预先定义的模型进行整合，只有当证据足够支持冗余判断时，才会将结果报告给用户。为了确保在可接受的时间和空间限制内运行，R2D2采用了多种优化技术和启发式算法，如局部搜索策略和阈值设定，以便在大规模代码库中快速执行检测任务。值得注意的是，由于程序员经常使用的复制粘贴操作，冗余代码的引入变得非常普遍。R2D2的出现旨在帮助开发者识别这种无意识的“剪贴”行为带来的后果，减少重复劳动，并提升软件代码的可维护性和一致性。该工具的设计原则是可扩展的，这意味着随着新的分析技术和改进方法的发展，R2D2可以适应不断变化的软件工程需求，保持其在冗余代码检测领域的领先地位。对于任何关注代码质量和效率的开发者或团队来说，R2D2都是一项有价值的工具，有助于提升软件产品的整体品质。"

DETECTION OF REDUNDANT CODE USING R

365

Unfortunately, there are some shortcomings: only very simple syntactical elements

are compared, which excludes detection of semantically equal code fragments that are

syntactically different. Moreover, the size threshold that must be used to avoid false

positives limits the detection to duplications of relatively big sections of code.

1.3.2. Syntactical analysis Syntactical analysis methods are based on the compar-

ison of abstract syntax trees (AST).

Abstract syntax trees have two nice properties for the detection process: (1) com-

ments and white space are automatically eliminated; (2) each identiﬁer is recognized

according to its context. The abstract syntax trees are then compared using several

possible techniques.

YAP (Wise, 1992, 1996) is a system that operates on a very simpliﬁed (and nor-

malized) abstract syntax tree. This tree is produced by canonicalizing the program

identiﬁers, reordering of the procedures according to invocation order, expansion of

each procedure on the ﬁrst invocation point and replacement of the remaining invoca-

tion by distinct markers, and deletion of all syntactical elements that do not belong to

the language, leaving only reserved words and names of pre-deﬁned procedures. The

result is a linearized sequence of syntactical elements which is then compared using an

algorithm that can deal with transposed subsequences. Note that YAP was developed

to deal primarily with the problem of software plagiarism where it is common to ﬁnd

transposed elements.

Clone Doctor (Baxter et al., 1998) is a tool that ﬁnds redundant code fragments via

the comparison of all subtrees of the abstract syntax tree of a program. This allows for

ﬁne grained comparison between code fragments but is computationally demanding.

The process is O(n

) on the number of nodes of the abstract syntax tree (an abstract

syntax tree typically has ten times more nodes than the number of lines of the repre-

sented program).

There are two solutions to make the process more efﬁcient: (1) diminish the number

of compared trees; (2) compare each tree with only some of the other trees. Clone

Doctor explores both solutions. Each tree is stored in an hash table, thus dividing the

tree space into several different independent subspaces. Only the trees in a given sub-

space are compared. Moreover, the indexing function ignores very small trees, largely

reducing the number of trees and making the process less sensitive to small code vari-

ations. Each subspace will then contain all subtrees that have a similar structure, only

differingontheleaves.

Each pair of trees on a subspace is then analyzed using a similarity function that

takes into account both the number of common nodes and the number of different

nodes. This approach can deal with cases where the code was structurally changed (by

adding or removing code).

1.3.3. Metric-based analysis Program metric is a measure used to characterize

quantitatively essential features of a program in order to allow its classiﬁcation, com-

parison and mathematical analysis (Conte et al., 1986).

One of the ﬁrst metric-based duplication detectors was presented in (Ottenstein,

1976) and explored four metrics, namely the number of unique operators, the number

of unique operands, the number of operators and the number of operands. Programs

剩余21页未读，继续阅读

wuha555

粉丝: 16
资源: 8

R2D2:一种用于检测冗余代码的高效工具

PyPI 官网下载 | client_of_redundant_servers-0.6.tar.gz

Almost Redundant Code Regenerator:对几乎需要冗余的代码进行增量更改-开源

An Experimental Study of Redundant Array of Independent SSDs and Filesystems

Inverse Kinematics of Redundant Manipulator Used in Tele-operation

Redundant-Activity-Detection

Variable Joint-Velocity Limits of Redundant Robot Manipulators Handled by Quadratic Programming

A novel middleware–based approach for redundant reader elimination using PSO

Decentralized kinematic control of a class of collaborative redundant manipulators via recurrent neural networks

Sparse+and+Redundant+Representation-code

Analysis of the multi-copied genes and the impact of the redundant protein coding sequences on gene annotation in prokaryotic genomes

最新资源