http://www.paper.edu.cn
Since empirical likelihood may avoid estimating variance explicitly, it seems to be attractive
to extend the approach to case-cohort analysis with semiparametric survival models. In this
paper, we develop an empirical likelihood method for making inferences about the regression
parameters in Cox model under the case-cohort design. The basic idea is introduced through
case-cohort analysis based on Prentice [1]’s approach. An empirical likelihood function with
suitable constraints are constructed. The resulting log-empirical likelihood ratio is shown to
follow the Wilks theorem, i.e., it converges to a chi-squared distribution. Hence, when doing
tests or constructing confidence regions, there is no need to estimate any variance-covariance
matrix nor to solve any estimating equation. Instead, certain optimization algorithm is needed
in the computation. The algorithm developed for the standard empirical likelihood method can
be applied here without any difficulty. Moreover, our idea can be extended to the case where
the subcohort is drawn by Bernoulli sampling with a known selection proportion. Compared
with the existing Wald-type inferential procedures, the proposed empirical likelihood based one
is easier to implement and thus provides an attractive alternative.
The rest of the paper is organized as followed. In Section 1, we introduce the notation and
describe Prentice [1]’s case-cohort analysis as well as the case-cohort analysis with Bernoulli
sampling under Cox model. In Section 3, the basic idea of the proposed empirical likelihood
method is illustrated. The Wilks theorem for the log-empirical likelihood ratio test statistic
is established. In Section 3, we present some Monte Carlo simulation results and illustrate
the method by a real example. Section 4 concludes. All technical details are summarized in
Appendix.
1 Case-cohort analysis under Cox model
Let T be failure time, Z be a p-dimensional covariate and C be censoring time. T and
C are assumed to be conditionally independent of each other given Z. In Cox model, the
conditional hazard function of T given Z = z is assumed to be
λ(t|z) = λ
0
(t) exp
β
>
0
z
,
where λ
0
(t) is the baseline hazard and β
0
is the p-dimensional regression parameter vector of
primary interest. Define Y = min{T, C} and δ = I{T 6 C} where I{·} represents the indicator
function. The entire cohort consists of n individuals and they can be viewed as n independently
and identically distributed (i.i.d.) copies of (Y, δ, Z), denoted by {(Y
i
, δ
i
, Z
i
), i = 1, 2, . . . , n}.
Under the case-cohort sampling, covariates are available only for the cases and for a simple
random sample from the entire cohort, i.e., the subcohort. Let ξ
i
be a binary indicator. It
takes values 1 and 0, indicating if or not the individual is selected into the subcohort. For each
i, Z
i
is observable only when ξ
i
= 1 or δ
i
= 1. Let en =
P
n
i=1
ξ
i
be the size of the subcohort,
- 4 -