International Journal of Security and Its Applications
Vol. 11, No. 3 (2017)
82 Copyright ⓒ 2017 SERSC
2. Related Work
2.1 Definition and Classification of Clones
The definition of code clone is widely adopted as similar syntax and semantic features
of code segment at present [3]. A clone pair is the two fragments that are similar to each
other in the same version of system. Two or more similar code segments form a clone
group. Clone group mapping reflects the change process from a previous version to the
current version. In current research, there are two main classification approaches [4]:
similarity degree and granularity of code segment. According to the text similarity of
source code, the clone is divided into Type-1 to Type-4 clone. Type-1 to Type-3 clone
reflects the degree of similarity in grammar, and Type-4 reflects the semantic similarity.
The size of the code segment is divided into files, blocks, functions, classes and
statements.
for(int i=1;i<=n;i++){
sum=sum+i;
prod=prod*i;
foo(sum, prod);
}
CF
for(int j=1;j<=t;j++){
s=s+j;
p=p*j;
foo(s,p);
}
CF
(Identifier rename)
for(int i=1;i<=n;i++){
sum=sum+i;
prod=prod*i;
foo(sum, prod);
}
CF
(Copy and paste operation)
int i=0; while(i<=n)
{ sum=sum+i;
prod=prod*i;
foo(sum, prod);
i++; }
CF
(Same function)
Type-1 Type-4
Type-2 Type-3
for(int i=1;i<=n;i++){
sum=sum+i;
prod=prod*i;
foo(prod);
}
CF
(Insert and delete operation)
Figure 1. Clone Type Definition and Classification
2.2. Clone Tracking
Cloning mapping needed to be established before extract clone genealogy, the accuracy
of clone mapping directly affects the whole study results. Therefore, it is very important
for clone evolution study.
Hotta et al. [5] proposed an approach to track the clone in evolutionary system.
According to the clone region descriptors (CRD), they described the clone mapping
relationship from the clone text content and the file location. Although the approach could
map clone that had changed positions, the false positive rate is higher.
Bakota et al. [6] proposed a mapping approach based on abstract syntax tree, this
approach mapped clone according to the file name, location, and clone distances.
Although this approach can map clones in multiple versions, but a large number of similar
features increase their time consumption.
Thummalapenta S et al. [7] proposed an approach based on the modified log to
tracking clone relationship. To obtain system modification log from the CVS code library,
the first version as the origin, to calculate the changes in the subsequent version, so as to
get the mapping relationship. However, due to the origin of the version of the standard, so
the new clone cannot be studied in the subsequent version.
Saha et al. [8] conduct the function mapping between versions according to the name
and file path, then to map clone from the detection results. Although this approach can
improve the running time, it is easy to be affected by the change in the position of the
clone.