SaTScan User Guide v10.1 13
In the standard normal model
9
, it is assumed that each observation is measured with the same variance.
That may not always be the case. For example, if an observation is based on a larger sample in one location
and a smaller sample in another, then the variance of the uncertainty in the estimates will be larger for the
smaller sample. If the reliability of the estimates differs, one should instead use the weighted normal scan
statistic
10
that takes these unequal variances into account. The weighted version is obtained in SaTScan by
simply specifying a weight for each observation as an extra column in the input file. This weight may for
example be proportional to the sample size used for each estimate or it may be the inverse of the variance
of the observation.
If all values are multiplied with or added to the same constant, the statistical inference will not change,
meaning that the same clusters with the same log likelihoods and p-values will be found. Only the estimated
means and variances will differ. If the weight is the same for all observations, then the weighted normal
scan statistic will produce the same results as the standard normal version. If all the weights are multiplied
by the same constant, the results will not change.
Related Topics: Analysis Tab, Likelihood Ratio Test, Methodological Papers, Probability Model
Comparison.
Continuous Poisson Model
All the models described above are based on data observed at discrete locations that are considered to be
non-random, as defined by a regular or irregular lattice of location points. That is, the locations of the
observations are considered to be fixed, and we evaluate the spatial randomness of the observation
conditioning on the lattice. Hence, those are all versions of what are called discrete scan statistics
174
. In a
continuous scan statistics, observations may be located anywhere within a study area, such as a square or
rectangle. The stochastic aspect of the data consists of these random spatial locations, and we are interested
to see if there are any clusters that are unlikely to occur if the observations where independently and
randomly distributed across the study area. Under the null hypothesis, the observations follow a
homogeneous spatial Poisson process with constant intensity throughout the study area, with no
observations falling outside the study area.
Example: The data may consist of the location of bird nests in a square kilometer area of a forest. The
interest may be to see whether the bird nests are randomly distributed spatially, or in other words, whether
there are clusters of bird nests or whether they are located independently of each other.
In SaTScan, the study area can be any collection of convex polygons, which are convex regions bounded
by any number straight lines. Triangles, squares, rectangles, rhombuses, pentagons and hexagons are all
examples of convex polygons. In the simplest case, there is only one polygon, but the study area can also
be the union of multiple convex polygons. If the study area is not convex, divide it into multiple convex
polygons and define each one separately. The study area does not need to be contiguous and may for
example consist of five different islands.
The analysis is conditioned on the total number of observations in the data set. Hence, the scan statistic
simply evaluates the spatial distribution of the observation, but not the number of observations.
The likelihood function used as the test statistic is the same as for the Poisson model for the discrete scan
statistic, where the expected number of cases is equal to the total number of observed observations, times
the size of the scanning window, divided by the size of the total study area. As such, it is a special case of
the variable window size scan statistic described by Kulldorff (1997)1. When the scanning window extends
outside the study area, the expected count is still based on the full size of the circle, ignoring the fact that
some parts of the circle have zero expected counts. This is to avoid strange non-circular clusters at the
border of the study area. Since the analysis is based on Monte Carlo randomizations, the p-values are
automatically adjusted for these boundary effects. The reported expected counts are based on the full circle