tions in a network, it is possible for two networks to have the same number of connections but dif
-
ferent clustering coefficients(see Fig. 2).Finally, notethat because the definitionsforT
i
and k
i
are
independent of whether the connections are based on edges or arcs, the clustering coefficients for
a directed network and the corresponding undirected network are equal.
The degree distribution P(k) represents the probability that a randomly chosen node will
have degree k (i.e., will have k neighbors). For directed networks, we will concentrate on the
distribution of in-degrees, although one can also look at the out-degree distribution. We esti
-
mate these distributions based on the relative frequencies of node degrees found throughout the
network. The most straightforward feature of P(k) is the expected value <k> under P(k). This
quantity, estimated by simply averaging the degree of all nodes over the network, ranges be
-
tween 0 and n (for a fully connected network of n nodes) and represents the mean density of
connections in the network. More information can be gleaned by plotting the full distribution
P(k) as a function of k, using either a bar histogram (for small networks, as in Fig. 2), a binned
scatter plot (for large networks, as in Fig. 5), or a smooth curve (for theoretical models, as later
illustrated in Fig. 3). As we explain in the next section, the shapes of these plots provide char
-
acteristic signatures for different kinds of network structure and different processes of network
growth.
Fig. 2 shows these statistical measures for two different networks with 10 nodes and 15
edges. These examples illustrate how networks equal in size and density of connections may
differ significantly in their other structural features. Fig. 2 also illustrates two general proper-
ties of random graphs—graphs that are generated by placing an edge between any two nodes
with some constant probability p independent of the existence of any other edge (a model intro-
duced by Erdös and Réyni, 1960). First, for fixed n and <k>, high values of C tend to imply
high values of L and D. Second, the degree distribution P(k) is approximately bell shaped, with
an exponential tail for high values of k.
4
Although these two properties hold reliably for ran-
dom graphs, they do not hold for many important natural networks, including semantic net-
works in natural language. We next turn to a detailed discussion of the small-world and
scale-free structures that do characterize natural semantic networks. Both of these structures
can be thought of in terms of how they contrast with random graphs: Small-world structures
are essentially defined by the combination of high values of C together with low values of L
and D, whereas scale-free structures are characterized by non-bell-shaped degree distributions,
with power-law (rather than exponential) tails.
3. Small-world and scale-free network structures
Interest in the small-world phenomenon originated with the classic experiments of Milgram
(1967) on social networks. Milgram’s results suggested that any two people in the United
States were, on average, separated by only a small number of acquaintances or friends (popu
-
larly known as “six degrees of separation”). Although the finding of very short distances be
-
tween random pairs of nodes in a large, sparsely connected network may seem surprising, it
does not necessarily indicate any interesting structure. This phenomenon occurs even in the
random graphs described previously, where each pair of nodes is joined by an edge with proba
-
bility p. When p is sufficiently high, the whole network becomes connected, and the average
distance L grows logarithmically with n, the size of the network (Erdös & Réyni, 1960).
M. Steyvers, J. B. Tenenbaum/Cognitive Science 29 (2005) 47