Visualizing Multivariate Spatial Correlation
4
“map” is but one of several linked views.
4
Similar visions underlie several other recent
efforts to develop open and modular software frameworks for the visualization of high
dimensional (spatial) data.
5
In addition to being freestanding, DynESDA2 also includes a number of other advances
over its predecessors, such as the capability to handle both point and polygon coverages,
“true” brushing of maps, simultaneous linking of multiple maps with multiple statistical
graphics, and interactive LISA maps. It also extends the visualization of spatial correlation
to a multivariate setting. We turn to this first.
3 Multivariate Spatial Correlation
The visualization and exploration of multivariate association is a core functionality of cur-
rent exploratory data analysis (EDA), knowledge discovery and data mining tools (Buja
et al. 1996, Han and Kamber 2001, Gahegan et al. 2002). The incorporation of “spatial”
association in this framework is still in its infancy, however. Most suggested approaches
pertain to geostatistical analysis, where data are represented as points and the measure of
spatial correlation is derived from the variogram (see, e.g. Cook et al. 1996, Majure and
Cressie 1997). Similar progress has not been made for the analysis of multivariate spatial
correlation for lattice data, i.e., spatial objects represented as discrete points or polygons.
6
We develop a visualization device for multivariate spatial correlation in lattice data by
building on some of the ideas originally advanced in Wartenberg (1985). There, a multi-
variate coefficient of spatial autocorrelation between two standardized random variables z
k
and z
l
is defined as:
m
kl
= z
0
k
W
s
z
l
, (1)
where z
k
= [x
k
− ¯x
k
]/σ
k
and z
l
= [x
l
− ¯x
l
]/σ
l
have been standardized such that the mean
is zero and standard deviation equals one, and W
s
is a doubly standardized (or, stochastic)
spatial weights matrix. The weights matrix defines the “neighbor set” for each observation
(with non-zero elements for neighbors, zero for others) and has zero on the diagonal by
convention.
This concept of multivariate spatial correlation thus centers on the extent to which val-
ues for one variable (z
k
) observed at a given location show a systematic (more than likely
under spatial randomness) assocation with another variable (z
l
) observed at the “neighbor-
ing” locations. Note that this multivariate spatial correlation can be considered in addition
to or instead of the usual (non-spatial) correlation between the two variables at the same
location. Wartenberg (1985) used this statistic to develop a notion of spatial principal
components, for which the double standardization of the weights matrix (and the implied
symmetry) was necessary.
For the purposes of visualization, our focus is on the linear association between a vari-
able z
k
at a location i, z
i
k
and the corresponding “spatial lag” for the other variable, [Wz
l
]
i
.
7
In this context, the usual singly-standardized (row-standardized) form of the spatial weights
matrix can be used, which yields an interpretation of the spatial lag as an “average” of
neighboring values. Also, the cross-product statistic can be re-scaled by dividing by the
4
See Unwin (1996) and Wilhelm and Steck (1998) for recent examples. Similar ideas are behind the Tcl/Tk
based cdv toolkit of Dykes (Dykes 1997, 1998) as well as Brundson’s exploration of local spatial association
using a dynamically linked “map” constructed with tools available in Xlispstat (Brundson 1998).
5
See, for example, MacEachren et al. (1999), Sutherland et al. (2000) and Gahegan et al. (2002).
6
Note that the points used in geostatistical analysis are sample points from a continuous surface. In contrast,
for lattice data the points are not a “sample,” but fixed locations at which a spatial pattern for a random variable
can be observed.
7
The notation indicates that the spatial lag for location i is the i-th element of the vector Wz
l
. See Anselin
(1988), for an extensive treatment of the notion of a spatial lag.