outside the window. I() is an indicator function. When SaTScan is set to scan only for clusters with high
rates, I() is equal to 1 when the window has more cases than expected under the null-hypothesis, and 0
otherwise. The opposite is true when SaTScan is set to scan only for clusters with low rates. When the
program scans for clusters with either high or low rates, then I()=1 for all windows.
The space-time permutation model uses the same function as the Poisson model. Due to the conditioning
on the marginals, the observed number of cases is only approximately Poisson distributed. Hence, it is no
longer a formal likelihood ratio test, but it serves the same purpose as the test statistic.
For the Bernoulli model the likelihood function is
1,2
:
()
)()(
)()(
I
nN
cCnN
nN
cC
n
cn
n
c
cCnNcCcnc −−−−−
−
−−−
−
−
−
where c and C are defined as above, n is the total number of cases and controls within the window, while
N is the combined total number of cases and controls in the data set.
The likelihood function for the multinomial, ordinal, exponential, and normal models are more complex,
due to the more complex nature of the data. We refer to papers by Jung, Kulldorff and Richards
6
, Jung,
Kulldorff and Klassen
7
; Huang, Kulldorff and Gregorio
8
; Kulldorff et al
9
, and Huang et al.
10
for the
likelihood functions for these models. The likelihood function for the spatial variation in temporal trends
scan statistic is also more complex, as it involves the maximum likelihood estimation of several different
trend functions.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum
likelihood constitutes the most likely cluster. This is the cluster that is least likely to have occurred by
chance. The likelihood ratio for this window constitutes the maximum likelihood ratio test statistic. Its
distribution under the null-hypothesis is obtained by repeating the same analytic exercise on a large
number of random replications of the data set generated under the null hypothesis. The p-value is
obtained through Monte Carlo hypothesis testing
14
, by comparing the rank of the maximum likelihood
from the real data set with the maximum likelihoods from the random data sets. If this rank is R, then p =
R / (1 + #simulation). In order for p to be a ‘nice looking’ number, the number of simulations is restricted
to 999 or some other number ending in 999 such as 1999, 9999 or 99999. That way it is always clear
whether to reject or not reject the null hypothesis for typical cut-off values such as 0.05, 0.01 and 0.001.
The SaTScan program scans for areas with high rates (clusters), for areas with low rates, or
simultaneously for areas with either high or low rates. The latter should be used rather than running two
separate tests for high and low rates respectively, in order to make correct statistical inference. The most
common analysis is to scan for areas with high rates, that is, for clusters.
Non-Compactness Penalty Function
When the elliptic window shape is used, there is an option to use a non-compactness (eccentricity)
penalty to favor more compact clusters
12
. The main reason for this is that the elliptic scan statistic will
under the null hypothesis typically generate an elliptic most likely cluster since there are more elliptic
than circular clusters evaluated, and it will often be a long and narrow ellipse, since there are more of
those. At the same time, the concept of clustering is based on a compactness criterion in the sense that the
cases in the cluster should be close to each other, so we are more interested in compact clusters. When the
non-compactness penalty is used, the pure likelihood ratio is no longer used as the test statistic. Rather,
the test statistic is defined as the log likelihood ratio multiplied with a non-compactness penalty of the
form [4s/(s+1)
2
]
a
, where s is the elliptic window shape defined as the ratio of the length of the longest to
the shortest axis of the ellipse. For the circle, s=1. The parameter a is a penalty tuning parameter. With
a=0, the penalty function is always 1 irrespectively of s, so that there is never a penalty. When a goes to
SaTScan User Guide v9.2 15