![](https://csdnimg.cn/release/download_crawler_static/10785814/bg3.jpg)
54 T. Kohonen / Neural Networks 37 (2013) 52–65
rule, on the other hand, was originally set up only for theoretical
reasons and to facilitate a comparison with the other self-
organizing models. Moreover, it is not possible to use the stepwise
learning with general metrics, but we will see that batch learning
also solves this problem.
More detailed descriptions of the SOM algorithms will be given
below.
Several commercial software packages as well as plenty of
freeware on the SOM are available. This author strongly encourages
the use of well-justified public-domain software packages. For
instance, there exist two freeware packages developed by us,
namely, the SOM_PAK (Kohonen, Hynninen, Kangas, & Laaksonen,
1996; SOM_PAK Team, 1990) and the SOM Toolbox (SOM Toolbox
Team, 1999; Vesanto, Alhoniemi, Himberg, Kiviluoto, & Parviainen,
1999; Vesanto, Himberg, Alhoniemi, & Parhankangas, 1999), both
downloadable from the Internet. Both packages contain auxiliary
analytical procedures, and especially the SOM Toolbox, which
makes use of the MATLAB functions, is provided with versatile
graphics means.
Unlike in most biologically inspired map models, the topo-
graphic order in the SOM can always be materialized globally over
the whole map.
The spatial order in the display facilitates a convenient and
quick visual inspection of the similarity relationships of the input
data as well as their clustering tendency, and comes in handy in the
verification and validation of data samples. Moreover, with proper
calibration of the models, the clustering and classification of the
data become explicit.
The rest of this article concentrates on the SOM principles and
applications. The SOM has been used extensively as a visualization
tool in exploratory data analysis. It has had plenty of practical
applications ranging from industrial process control and finance
analyses to the management of very large document collections.
New, promising applications exist in bioinformatics. The largest
applications so far have been in the management and retrieval of
textual documents, of which this paper contains two examples.
Many versions of the SOM algorithms have been suggested
over the years. They are too numerous to be reviewed here; cf.
the extensive bibliographies mentioned as Refs. (Kaski, Kangas, &
Kohonen, 1998; Oja, Kaski, & Kohonen, 2003; Pöllä, Honkela, &
Kohonen, 2009). See also the Discussion, Section 7.
3.2. Calibration of the SOM
If the input items fall in a finite number of classes, the different
models can be made to correspond to these classes and to
be provided with corresponding symbolic labels. This kind of
calibration of the models can be made in two ways: 1. If the
number of input items is sufficiently large, one can first study the
distribution of matches that all of the input data items make with
the various models. A particular model is labeled according to that
class that occurs in the majority of input samples that match with this
model. In the case of a tie, one may carry out, e.g., a majority voting
over a larger neighborhood of the model. 2. If there is only a smaller
number of input data items available so that the above majority
voting makes no sense (e.g., there are too many ties, or there are no
hits at some of the models), one can apply the so-called k-nearest-
neighbors (kNN) method. For each model, those k input data items
that are closest to it (in the metric applied in the construction of
the SOM) are searched, and a majority voting over them is carried
out to determine the most probable classification of the node. In
the case of a tie, the value of k is increased until the tie is resolved.
Usually k is selected to be on the order of half a dozen to a hundred,
depending on the number of input data items and the size of the
SOM array.
When a new, unknown input item is compared with all of the
models, it will be identified with the best-matching model. The
classification of the input item is then understood as that of the
best-matching model.
3.3. On ‘‘matching by similarity’’
There exist many versions of the SOM, which apply different
definitions of ‘‘similarity’’. This property deserves first a short dis-
cussion. ‘‘Similarity’’ and ‘‘distance’’ are usually opposite concepts.
The cognitive meaning of similarity is a very vague one. For
instance, one may talk of the similarity of two persons or two
historical eras, although such a comparison is usually based on a
subjective opinion.
If the same comparison should be implemented automatically,
it can only be based on some very restricted analytical, say
statistical attributes. The situation is much clearer, if we deal with
concrete objects in science or technology, since we can then base
the definition of dissimilarity on basic mathematical concepts of,
say, distance measures between attribute vectors. The statistical
figures are usually also expressed as real vectors, consisting of
numerical results or other statistical indicators. Various kinds
of spectra and other transformations can also be regarded as
multidimensional vectors of their components.
The first problem in trying to compare such vectors is usually
different scaling of their elements. For metric comparison, a simple
remedy is to normalize the scales so that either the variances of the
variables in the different dimensions, or their maxima and minima,
respectively, become the same. After that, some standard distance
measure, such as the Euclidean, or more generally, the Minkowski
distance, etc. can be tried, the choice depending on the nature
of the data. It has turned out that the Euclidean distance, with
normalization, is already applicable to most practical studies, since
the SOM is able to display even complex interdependencies of the
variables in its display.
A natural measure of the similarity of vectorial items is in
general some inner product. In the SOM research, the dot product
is commonly used. This measure also complies better with the
biological neural models than the Euclidean distance. However,
the model vectors m
i
, for their comparison with the input x, must
be kept normalized to constant length all the time. If the vector
dimensionality is high, and also the input vectors are normalized
to constant length, the difference between SOMs based on the
Euclidean distances and the dot products is insignificant. (For the
construction of Euclidean and dot-product SOMs, cf. Sections 4.1
and 4.5, respectively.) On the other hand, if there are plenty of
zero elements in the vectors, the computation of dot products is
correspondingly faster. This property can be utilized effectively
especially in the fast computation of document maps discussed at
the end of this article.
Before proceeding further, it will be necessary to emphasize a
basic fact. An image, often given as a set of pixels or other structural
elements, will usually not be applicable as such as an input vector.
The natural variations in the images, such as translations, rotations,
variations of size, etc., as well as variations due to different lighting
conditions are usually so wide that a direct comparison of the
objects on the basis of their appearances does not make any
sense. Instead, the classification of natural items shall be based
on the extraction and classification of their characteristic features
which must be as invariant as possible. Features of this type may
consist of color spectrograms, expansions of the images in Fourier
transforms, wavelets, principal components, or eigenvectors of
some image operators, etc. If one can describe the input objects by
a restricted set of invariant features, the dimensionality of the input
representations, and the computing load are reduced drastically.
A special kind of dissimilarity or distance measure is applied
in an SOM that is called the Adaptive-Subspace SOM (ASSOM), cf.
Kohonen (1995, 1996, 2001) and Kohonen, Kaski, and Lappalainen
(1997). In it, certain elementary systems are associated with the
nodes, and these systems develop into specific filters that respond
invariantly to some class (e.g., translation-invariant, rotation-
invariant, or scale-invariant) of local features. Their parameters