In one way, there is little apparent difference between conventional statistical approaches and
network approaches. Univariate, bi-variate, and even many multivariate descriptive statistical
tools are commonly used in the describing, exploring, and modeling social network data. Social
network data are, as we have pointed out, easily represented as arrays of numbers -- just like
other types of sociological data. As a result, the same kinds of operations can be performed on
network data as on other types of data. Algorithms from statistics are commonly used to
describe characteristics of individual observations (e.g. the median tie strength of actor X with
all other actors in the network) and the network as a whole (e.g. the mean of all tie strengths
among all actors in the network). Statistical algorithms are very heavily used in assessing the
degree of similarity among actors, and if finding patterns in network data (e.g. factor analysis,
cluster analysis, multi-dimensional scaling). Even the tools of predictive modeling are
commonly applied to network data (e.g. correlation and regression).
Descriptive statistical tools are really just algorithms for summarizing characteristics of the
distributions of scores. That is, they are mathematical operations. Where statistics really
become "statistical" is on the inferential side. That is, when our attention turns to assessing the
reproducibility or likelihood of the pattern that we have described. Inferential statistics can be,
and are, applied to the analysis of network data. But, there are some quite important
differences between the flavors of inferential statistics used with network data, and those that
are most commonly taught in basic courses in statistical analysis in sociology.
Probably the most common emphasis in the application of inferential statistics to social science
data is to answer questions about the stability, reproducibility, or generalizability of results
observed in a single sample. The main question is: if I repeated the study on a different sample
(drawn by the same method), how likely is it that I would get the same answer about what is
going on in the whole population from which I drew both samples? This is a really important
question -- because it helps us to assess the confidence (or lack of it) that we ought to have in
assessing our theories and giving advice.
To the extent the observations used in a network analysis are drawn by probability sampling
methods from some identifyable population of actors and/or ties, the same kind of question
about the generalizability of sample results applies. Often this type of inferential question is of
little interest to social network researchers. In many cases, they are studying a particular
network or set of networks, and have no interest in generalizing to a larger population of such
networks (either because there isn't any such population, or we don't care about generalizing
to it in any probabilistic way). In some other cases we may have an interest in generalizing, but
our sample was not drawn by probability methods. Network analysis often relies on artifacts,
direct observation, laboratory experiments, and documents as data sources -- and usually
there are no plausible ways of identifying populations and drawing samples by probability
methods.
The other major use of inferential statistics in the social sciences is for testing hypotheses. In
file:///C|/Documents%20and%20Settings/hanneman/My%2...s/Network_Text/Version2/C1_Social_Network_Data.html (16 of 18)3/17/2005 11:28:45 AM