J. Pan, R. Thompson / Computational Statistics & Data Analysis 51 (2007) 5765–5775 5767
where
i
(; b) ∝
i
y
i
a
i
(y
i
− u)
(u)
du (4)
defines the conditional log quasi-likelihood of the fixed effects given the random effects b (Breslow and Clayton,
1993). The likelihood (3) is then maximized with respect to and in order to obtain the MLE
ˆ
and
ˆ
.
However, the likelihood L(, ) in (3) in general involves analytically intractable integrals. When the dimension
q of the random effects b is low (e.g., q<5), the GHQ approach is useful for approximating the likelihood (Pan
and Thompson, 2003). As mentioned in Section 1, this technique is, however, no longer helpful for high-dimensional
random effects. The Laplace approximation (Breslow and Clayton, 1993) and EM algorithm (Booth and Hobert, 1999)
were proposed to calculate the MLEs of the parameters and in the GLMM. MCMC was also used (Karim and
Zeger, 1992).
3. Quasi-Monte Carlo integration
We propose to use the QMC approach to approximate the integrated quasi-likelihood L(, ) in (3). To gain an
insight into the QMC approach, we first briefly review the traditional Monte Carlo (MC) approximation. Suppose f(·)
is an integrable function on the q-dimensional unit cube C
q
=[0, 1)
q
. Consider the integral
I(f)=
C
q
f(x)dx. (5)
The MC integration draws a random sample P
K
={x
k
: 1 k K} from the uniform distribution on C
q
and then
approximates the integral in (5) through
ˆ
I
K
(f, P
K
) =
1
K
K
k=1
f(x
k
). (6)
By the strong law of large numbers the estimate
ˆ
I
K
(f, P
K
) converges to I(f) with probability one as K →∞.
Moreover, the central limit theorem guarantees that
ˆ
I
K
(f, P
K
) is asymptotically normally distributed when the sample
size K is large enough. The convergence rate for the MC integration is of the order O(K
−1/2
), regardless of the integral
dimensionality q. On the other hand, the previous statement regarding the convergence of the MC approximation is a
probabilistic one, implying that the MC approximation in general may behave well but for a particular random sample
may lead to a very poor approximation. Hence, it is necessary to make multiple draws of random samples and take the
average of all the approximations, which may be computationally expensive.
The QMC approach aims to improve the MC approximation in terms of faster convergence rate and less computational
load. The key idea is to use integration nodes that are scattered uniformly on C
q
to replace the MC random samples.
The reason behind this is due to the famous Koksma–Hlawka inequality:
|I(f)−
ˆ
I
K
(f, P
K
)| V(f)D(P
K
), (7)
where V(f)is the bounded total variation of f(·) over C
q
in the sense of Hardy and Krause (Fang and Wang, 1994,
p. 64). The quantity D(P
K
) measures the evenness of spread of the point set P
K
, defined by
D(P
K
) = sup
x∈C
q
|U
K
(x) − U(x)|, (8)
where U(x) is the uniform distribution on C
q
and U
K
(x) is the empirical distribution of P
K
. D(P
K
) in (8) is actually
the Kolmogorov–Smirnov statistic but known as the discrepancy of P
K
in analytical number theory. The inequality
(7) implies that the absolute error of the integration approximation is bounded and dominated by D(P
K
) since V(f)
is a constant as far as f(·) is given. Therefore, the points P
K
with the smallest discrepancy are the best integration
nodes in this sense. It can be shown that the smallest discrepancy is of the order O((log K)
q−1
/K) (Fang and Wang,
1994). Accordingly, the QMC integration has a faster convergence rate than the MC approximation. Unlike the MC
approximation, the QMC integration nodes are deterministic so that multiple draws are not necessary.