in which kx
i
−x
j
k is the Euclidean distance between the high-dimensional datapoints x
i
and x
j
and ky
i
−y
j
k is the
Euclidean distance between the low-dimensional datapoints y
i
and y
j
. The Sammon cost function is given by
φ(Y ) =
1
P
ij
kx
i
− x
j
k
X
ij
(kx
i
− x
j
k − ky
i
− y
j
k)
2
kx
i
− x
j
k
(7)
The Sammon cost function differs from the raw stress function in that it puts more emphasis on retaining distances
that were originally small. The minimization of the stress function can be performed using various methods, such
as the eigendecomposition of a pairwise dissimilarity matrix, the conjugate gradient method, or a pseudo-Newton
method [19].
MDS is widely used for the visualization of data, e.g., in fMRI analysis [77] and in molecular modelling [88]. The
popularity of MDS has led to the proposal of variants such as SPE [2] and FastMap [27].
3.2.2 SPE
Stochastic Proximity Embedding (SPE) is an iterative algorithm that minimizes the MDS raw stress function. SPE
differs from MDS in the efficient rule it employs to update the current estimate of the low-dimensional data represen-
tation. In addition, SPE can readily be applied in order to retain only distances in a neighborhood graph defined on the
graph, leadin to behavior that is comparable to, e.g., Isomap (see subsubsection 3.2.3).
SPE minimizes the MDS raw stress function that was given in Equation 6. For convenience, we rewrite the raw stress
function as
φ(Y ) =
X
ij
(d
ij
− r
ij
)
2
(8)
where r
ij
is the proximity between the high-dimensional datapoints x
i
and x
j
(computed by r
ij
=
kx
i
−x
j
k
max R
), and d
ij
is the Euclidean distance between their low-dimensional counterparts y
i
and y
j
in the current approximation of the
embedded space. SPE can readily be performed in order to retain solely distances in a neighborhood graph G defined
on the data, by setting d
ij
and r
ij
to 0 if (i, j) /∈ G. Using this setting, SPE behaves comparable to techniques based
on neighborhood graphs. Nonlinear dimensionality reduction techniques based on neighborhood graphs are discussed
in more detail later.
SPE performs an iterative algorithm in order to minimize the raw stress function defined above. The initial positions
of the points y
i
are selected randomly in [0, 1]. An update of the embedding coordinates y
i
is performed by randomly
selecting s pairs of points (y
i
, y
j
). For each pair of points, the Euclidean distance in the low-dimensional data repre-
sentation Y is computed. Subsequently, the coordinates of y
i
and y
j
are updated in order to decrease the difference
between the distance in the original space r
ij
and the distance in the embedded space d
ij
. The updating is performed
using the following update rules
y
i
= y
i
+ λ
r
ij
− d
ij
2d
ij
+ ǫ
(y
i
− y
j
) (9)
y
j
= y
j
+ λ
r
ij
− d
ij
2d
ij
+ ǫ
(y
j
− y
i
) (10)
where λ is a learning parameter that decreases with the number of iterations, and ǫ is a regularization parameter that
prevents divisions by zero. The updating of the embedded coordinates is performed for a large number of iterations
(e.g., 10
5
iterations). The high number of iterations is feasible because of the low computational costs of the update
procedure.
3.2.3 Isomap
Multidimensional scaling has proven to be successful in many applications, but it suffers from the fact that it is based
on Euclidean distances, and does not take into account the distribution of the neighboring datapoints. If the high-
dimensional data lies on or near a curved manifold, such as in the Swiss roll dataset [79], MDS might consider two
datapoints as near points, whereas their distance over the manifold is much larger than the typical interpoint distance.
Isomap [79] is a technique that resolves this problem by attempting to preserve pairwise geodesic (or curvilinear)
7