没有合适的资源?快使用搜索试试~ 我知道了~
首页analysis of longitudinal data(不是同名书籍)
analysis of longitudinal data(不是同名书籍)
需积分: 13 113 浏览量
更新于2023-05-26
评论 1
收藏 841KB PDF 举报
纵向数据作为一种特殊形式的数据,广泛地产生于医学和社会学等领域。它主要来自于是每个个体在不同时间点上的观测值。参数混合模型(也叫随机效应模型)是分析纵向数据的有力工具。纵向数据研究的一个难点是怎样考虑组内相关,而线性和非线性混合效应模型很好地解决了这个问题,所以线性和非线性混合效应模型被广泛的应用于纵向数据的研究。
资源详情
资源评论
资源推荐

11
Analysis of Longitudinal Data
In health-related studies, researchers often collect data from the same unit
(or subject) repeatedly over time. Measurements may be taken at different
times for different subjects. These are called longitudinal studies. Diggle,
Liang, and Zeger (1994) offer an excellent exposition of the issues related
to the design of such studies and the analysis of longitudinal data. They
also provide many interesting examples of data. We refer to their book
for a thorough treatment of the topic. The purpose of this chapter is to
introduce the methods based on recursive partitioning and to compare the
analyses of longitudinal data using different approaches.
11.1 Infant Growth Curves
The data for this example were collected from a retrospective study by Dr.
John Leventhal and his colleagues at Yale University School of Medicine,
New Haven, Connecticut. Their primary aim was to study the risk factors
during pregnancy that may lead to the maltreatment of infants after birth
such as physical and sexual abuse. The investigators recruited 298 children
born at Yale–New Haven Hospital after reviewing the medical records for
all women who had deliveries from September 1, 1989, through September
30, 1990. Detailed eligibility criteria have been reported previously else-
where such as Wasserman and Leventhal (1993) and Stier et al. (1993).
The major concern underlying the sample selection was the ascertainment
of cocaine exposure. Two groups of infants were included: those whose
©
H. Zhang and B.H. Singer, Recursive Partitioning and Applications,
Springer Series in Statistics, DOI 10.1007/978-1-4419-6824-1_11,
163
Springer Science+Business Media, LLC 2010

164 11. Analysis of Longitudinal Data
FIGURE 11.1. Growth curves of body weights for 20 representative infants
mothers were regular cocaine users and those whose mothers were clearly
not cocaine users. The group membership was classified from the infants’
log of toxicology screens and their mothers’ obstetric records. In addition,
efforts have been made to match the unexposed newborns with the exposed
ones for date of birth, medical insurance, mother’s parity, age, and timing
of the first prenatal visit. The question of our concern is whether a mother’s
cocaine use has a significant effect on the growth of her infant.
After birth, the infants were brought back to see their pediatricians.
At each visit, body weight, height, and head circumference were recorded.
Figure 11.1 shows the growth curves of body weights for 20 randomly chosen
children.
Figure 11.1 suggests that the variability of weight increases as children
grow. Thus, we need to deal with this accelerating variability while model-
ing the growth curves. In Section 11.5.5 we will explain the actual process
of fitting these data. At the moment, we go directly to the result of analysis
reported in Zhang (1999) and put on the table what the adaptive spline
model can offer in analyzing longitudinal data.
Using mother’s cocaine use, infant’s gender, gestational age, and race
(White or Black) as covariates, Zhang (1999) identified the following model,
ˆ
f(x)=0.744 + 0.029d − 0.0092(d − 120)
+
− 0.0059(d − 200)
+
+(g
a
− 28)
+
{0.2+0.0005d − 0.0007(d − 60)
+
− 0.0009(d − 490)
+
}
+s{−0.0026d +0.0022(d − 120)
+
}, (11.1)

11.2 The Notation and a General Model 165
where d stands for infant’s age in days and g
a
for gestational age in weeks.
The variable s is the indicator for gender: 1 for girls and 0 for boys. The
absence of mother’s cocaine use in model (11.1) is a sign against its promi-
nence. Nonetheless, we will reexamine this factor later.
According to model (11.1), the velocity of growth lessens as a child ma-
tures. Beyond this common sense knowledge, model (11.1) defines several
interesting phases among which the velocity varies. Note that the knots for
age are 60, 120, 200, and 490 days, which are about 2, 4, 8, and 16 months.
In other words, as the velocity decreases, its duration doubles. This insight
cannot be readily revealed by traditional methods. Furthermore, girls grow
slower soon after birth, but start to catch up after four months. Gestational
age affects birth weight, as immense evidence has shown. It also influences
the growth dynamics. In particular, a more mature newborn tends to grow
faster at first, but later experiences aslowergrowthasopposedtoaless
mature newborn. Finally, it is appealing that model (11.1) mathematically
characterizes the infant growth pattern even without imposing any prior
knowledge. This characterization can provide an empirical basis for fur-
ther refinement of the growth pattern with expert knowledge as well as
assessment of other factors of interest.
11.2 The Notation and a General Model
To analyze longitudinal data, first we need to formulate them into a general
statistical framework. To this end, some notation is inevitable. Suppose that
we have recruited n subjects into a longitudinal study. Measurements are
repeatedly taken for every subject over a number of occasions (sometimes
referred to as visits or examinations).
Table 11.1 provides an abstract representation of the data such as those
plotted in Figure 11.1. To simplify the presentation, we restrict all subjects
to have the same number of occasions q in the table. For subject i at occa-
sion j, x
k,ij
and Y
ij
are respectively the measurement of the kth covariate
TABLE 11.1. Longitudinal Data Configuration
Occasion (visit or examination)
Subject 1 ··· q
1 t
11
,x
1,11
, ··· ,x
p,11
,Y
11
··· t
1q
,x
1,1q
, ··· ,x
p,1q
,Y
1q
.
.
.
.
.
.
.
.
.
it
i1
,x
1,i1
, ··· ,x
p,i1
,Y
i1
··· t
iq
,x
1,iq
, ··· ,x
p,iq
,Y
iq
.
.
.
.
.
.
.
.
.
nt
n1
,x
1,n1
, ··· ,x
p,n1
,Y
n1
··· t
nq
,x
1,nq
, ··· ,x
p,nq
,Y
nq
Reproduced from Table 1 of Zhang (1997)

166 11. Analysis of Longitudinal Data
x
k
(k =1,...,p) and the observed value of the response Y at measurement
time t
ij
(j =1,...,q,i=1,...,n). In the growth curve data, we have four
(p = 4) covariates in addition to age (measurement time t) of visits. Birth
weight is the outcome variable Y.
The problem of interest is to model the relationship of Y to the mea-
surement time, t, and the p covariates, x
1
to x
p
, namely, to establish the
relationship
Y
ij
= f (t
ij
,x
1,ij
,...,x
p,ij
)+e
ij
, (11.2)
where f is an unknown function, e
ij
is the error term, j =1,...,q, and
i =1,...,n.Estimating model (11.2) such as the derivation of model (11.1)
is imperative for addressing scientific questions for which the data are col-
lected.
Model (11.2) differs from a usual multivariate regression model, e.g.,
(10.1), in that e
ij
(j =1,...,q) has an autocorrelation structure Σ
i
within
the same subject i. As will be defined below, the specification of Σ
i
varies
from a parametric approach to a nonparametric one.
11.3 Mixed-Effects Models
Mixed-effects models are commonly used to analyze longitudinal data; see,
e.g., Crowder and Hand (1990, Ch. 6) and Laird and Ware (1982). They
assume that
Y
ij
=
p
k=0
β
k
x
k,ij
+
r
k=0
ν
ki
z
k,ij
+
ij
, (11.3)
where the β’s are unknown parameters, ν
i
=(ν
1i
,...,ν
pi
)
is a p-dimensional
random vector,
i
=(
i1
,...,
ip
)
is a p-dimensional vector of measurement
errors, and for convenience, t
ij
is replaced with x
0,ij
. The vector ν
i
reflects
the random fluctuation of subject i toward the population, and it is re-
ferred to as random coefficients, coupled with random effects z
1
to z
r
. The
specification of random effect factors has to be decided on a case-by-case
basis.
Model (11.3) is called a mixed-effects model or simply a mixed model in
light of the fact that the model facilitates both fixed-effect parameters β
and random-effect parameters ν
i
. Sometimes, model (11.3) is also referred
to as a two-stage linear model because of the hierarchal assumptions as
delineated below.
The first stage describes the distribution for
i
within the same individ-
ual, and the second stage takes into account the across-individual variations
expressed through ν
i
. Specifically, we assume, in the first stage, that
i
∼ N(0,R
i
),i=1,...,n, (11.4)

11.3 Mixed-Effects Models 167
and in the second stage that
ν
i
∼ N(0,G),i=1,...,n, (11.5)
and ν
i
and ν
j
, resp.
i
and
j
, (for i = j) are independent, i, j =1,...,n.
Moreover, we assume that anything between two individuals is independent
of each other.
We can regard model (11.3) as a specific case of model (11.2) in the sense
that
f(t
ij
,x
1,ij
,...,x
p,ij
)=
p
k=0
β
k
x
k,ij
and
e
ij
=
p
k=0
ν
ki
x
k,ij
+
ij
,
j =1,...,q, i =1,...,n. Let y
i
=(Y
i1
,...,Y
iq
)
. Then assumptions (11.4)
and (11.5) imply that
IE{y
i
} = X
i
β, Σ
i
=Cov{y
i
} = X
i
GX
i
+ R
i
,
where X
i
is the design matrix for individual i, i.e.,
X
i
=
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎝
x
0,i1
··· x
p,i1
.
.
.
.
.
.
.
.
.
x
0,ij
··· x
p,ij
.
.
.
.
.
.
.
.
.
x
0,iq
··· x
p,iq
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎠
,i=1,...,n.
What are the essential steps in applying model (11.3) for the analysis of
longitudinal data? A convenient approach to carry out the computation is
the use of PROC MIXED in the SAS package. One tricky step is the specifica-
tion of random effects. It requires knowledge of the particular study design
and objective. See Kleinbaum et al. (1988, Ch. 17) for general guidelines.
Following this step, it remains for us to specify the classes of covariance
structures R
i
in (11.4) and G in (11.5).
In practice, R
i
is commonly chosen as a diagonal matrix; for example,
R
i
= σ
2
1
I;hereI is an identity matrix. The resulting model is referred to
as a conditional independence model. In other words, the measurements
within the same individuals are independent after removing the random-
effect components. In applications, a usual choice for G is σ
2
2
I. The dimen-
sion of this identity matrix is omitted, and obviously it should conform with
G. The subscripts of σ remind us of the stage in which these covariance
matrices take part.
There are many other choices for covariance matrices. A thorough list
of options is available from SAS Online Help. Diggle et al. (1991) devote
剩余35页未读,继续阅读











安全验证
文档复制为VIP权益,开通VIP直接复制

评论0