894 T. Stützle, H.H. Hoos / Future Generation Computer Systems 16 (2000) 889–914
lation can be captured by the correlation coefficient,
which is defined as:
ρ(F,D) =
Cov(F, D)
√
Var (F )
√
Var (D)
, (5)
where Cov(F, D) is the covariance between the ran-
dom variables F and D which probabilistically de-
scribe the fitness and the distance of local optima to a
global optimum, while Var denotes the variance. This
correlation coefficient can be empirically estimated by
substituting the covariance and the variance values by
the respective empirically measured ones. The FDC
analysis has shown to be very useful in the context of
studying the effectiveness of adaptive algorithms and
their design [3,4,31]. Note that for minimization prob-
lems, a high, positive correlation between the solution
cost and the distance to the global optimum indicates
that the smaller the solution cost, the closer are the so-
lutions — on average — to a global optimum. Hence,
if a problem shows a high FDC, algorithms combin-
ing adaptive solution generation and local search may
be expected to perform well. For ACO algorithms,
this is the case because the most important guidance
mechanism of ACO algorithms is the solution quality
of the solutions constructed by the ants — the better
a solution, the more its solution components will be
reinforced. Yet, if no such correlation exists, or even
worse, if cost and distance are negatively correlated,
the fitness gives only little or no guidance towards
better solutions and on such problems ACO algorithm
may perform poorly.
3.2. FDC analysis for the TSP
The symmetric TSP is one of the most widely
studied problems in terms of search space analysis
[3,4,24,34]. A good distance measure between two
tours s and s
0
is given by the number of different arcs,
that is, d(s,s
0
) = n−|{(i, j) : (i, j) ∈ s ∧(i, j ) ∈ s
0
}|
(n is the number of cities). A first study of the corre-
lation between the solution quality and the distance
to the global optimum has been done in [3]. Addi-
tionally, plots of the solution cost versus the distance
to the closest global optimum have shown to be a
very illustrative tool for the graphical presentation of
the fitness–distance relationship [3,23,31]. Here, we
exemplify results on the FDC analysis using some in-
stances which are larger than previously studied ones.
For our investigation we use a 3-opt local
search algorithm [25]. This local search algorithm
proceeds by systematically testing whether the cur-
rent tour can be improved by replacing at most
three arcs. Straightforward 3-opt implementations
require O(n
3
) exchanges to be examined. Since
this is too time-consuming in practice, we use a
number of standard speed-up techniques [1,22,29]
which achieve a subquadratical growth of the lo-
cal search time with instance size. In particular, we
restrict the set of examined moves to a candidate
list of a fixed number of nearest neighbors; here,
as a default we set this number to 40. Addition-
ally, we apply a fixed radius nearest neighbor search
[1,22].
For some instances several globally optimal solu-
tions exist. To partially address this issue, for each
of these problems we generated a number of glob-
ally optimal solutions, then eliminated doubles and
in our FDC analysis we use the distance to the clos-
est of these global optima. (Note that the number in
the instance name gives the number of cities.) Fig.
2 gives plots of the percentage deviation from the
optimum versus the distance to the closest global
optimum. All plots show a strong positive corre-
lation between solution cost and the distance from
the closest optimal solution — better local min-
ima tend to be closer to the global optimum. Some
summary results for the FDC analysis are given in
Table 1, in particular, the average distance between
local minima and the average distance to global op-
tima, the respective ratios to the maximum possible
distance, and the correlation coefficients are given.
Interestingly, the ratio between the average distance
of tours and the distance size (column avg
d-ls
/n in
Table 1) is very small; this fact indicates that locally
optimal tours in the TSP are concentrated around a
small region of the whole search space (this partic-
ularity has also been observed before [3,4,24,34]).
Also, the correlation coefficients are all statistically
significant. Note that the generality of these results
does not depend on the particular 3-opt algorithm
used. When using 2-opt or the more powerful
Lin–Kernighan heuristic (LK) for the local search,
again a strongly significant FDC, which was slightly
weaker when using 2-opt and stronger when using
LK, has been observed for the instances examined in
[3].