where X
i
is a predictor variable
2
taking on the value of 0 or
1 depending on whether item i is of type A or B respectively,
and e
si
N(0,
r
2
) indicates that the trial-level error is nor-
mally distributed with mean 0 and variance
r
2
. In the pop-
ulation, participants respond to items of type B 40 ms faster
than items of type A. Under this first model, we assume that
each of the 16 observations provides the same evidence for
or against the treatment effect regardless of whether or not
any other observations have already been taken into ac-
count. Performing an unpaired t-test on these data would
implicitly assume this (incorrect) generative model.
Model (1) is not a mixed-effects model because we have
not defined any sources of clustering in our data; all obser-
vations are conditionally independent given a choice of
intercept, treatment effect, and noise level. But experience
tells us that different subjects are likely to have different
overall response latencies, breaking conditional indepen-
dence between trials for a given subject. We can expand
our model to account for this by including a new offset
term S
0s
, the deviation from b
0
for subject s. The expanded
model is now
Y
si
¼ b
0
þ S
0s
þ b
1
X
i
þ e
si
;
S
0s
Nð0;
s
2
00
Þ;
e
si
Nð0;
r
2
Þ:
ð2Þ
These offsets increase the model’s expressivity by allowing
predictions for each subject to shift upward or downward
by a fixed amount (Fig. 1b). Our use of Latin letters for this
term is a reminder that S
0s
is a special type of effect which
is different from the bs—indeed, we now have a ‘‘mixed-ef-
fects’’ model: parameters b
0
and b
1
are fixed effects (effects
that are assumed to be constant from one experiment to
another), while the specific composition of subject levels
for a given experiment is assumed to be a random subset
of the levels in the underlying populations (another instan-
tiation of the same experiment would have a different
composition of subjects, and therefore different realiza-
tions of the S
0s
effects). The S
0s
effects are therefore random
effects; specifically, they are random intercepts, as they al-
low the intercept term to vary across subjects. Our primary
goal is to produce a model which will generalize to the
population from which these subjects are randomly drawn,
rather than describing the specific S
0s
values for this sam-
ple. Therefore, instead of estimating the individual S
0s
ef-
fects, the model-fitting algorithm estimates the
population distribution from which the S
0s
effects were
drawn. This requires assumptions about this distribution;
we follow the common assumption that it is normal, with
a mean of 0 and a variance of
s
2
00
; here
s
2
00
is a random effect
parameter, and is denoted by a Greek symbol because, like
the bs, it refers to the population and not to the sample.
Note that the variation on the intercepts is not con-
founded with our effect of primary theoretical interest
(b
1
): for each subject, it moves the means for both condi-
tions up or down by a fixed amount. Accounting for this
variation will typically decrease the residual error and thus
increase the sensitivity of the test of b
1
. Fitting Model (2) is
thus analogous to analyzing the raw, unaggregated re-
sponse data using a repeated-measures ANOVA with SS
sub-
jects
subtracted from the residual SS
error
term. One could see
that this analysis is wrong by observing that the denomi-
nator degrees of freedom for the F statistic (i.e., corre-
sponding to MS
error
) would be greater than the number of
subjects (see Online Appendix for further discussion and
demonstration).
Although Model (2) is clearly preferable to Model (1),it
does not capture all the possible by-subject dependencies
in the sample; experience also tells us that subjects often
vary not only in their overall response latencies but also
in the nature of their response to word type. In the present
hypothetical case, Subject 3 shows a total effect of
134 ms, which is 94 ms larger than the average effect in
the population of 40 ms. We have multiple observations
per combination of subject and word type, so this variabil-
ity in the population will also create clustering in the sam-
ple. The S
0s
do not capture this variability because they
only allow subjects to vary around b
0
. What we need in
addition are random slopes to allow subjects to vary with
respect to b
1
, our treatment effect. To account for this var-
iation, we introduce a random slope term S
1s
with variance
s
2
11
, yielding
Y
si
¼ b
0
þ S
0s
þðb
1
þ S
1s
ÞX
i
þ e
si
;
ðS
0s
; S
1s
ÞN 0;
s
2
00
qs
00
s
11
qs
00
s
11
s
2
11
"# !
;
e
si
Nð0;
r
2
Þ:
ð3Þ
This is now a mixed-effects model with by-subject random
intercepts and random slopes. Note that the inclusion of the
by-subject random slope causes the predictions for condi-
tion B to shift by a fixed amount for each subject (Fig. 1c),
improving predictions for words of type B. The slope offset
S
1s
captures how much Subject s’s effect deviates from the
population treatment effect b
1
. Again, we do not want our
analysis to commit to particular S
1s
effects, and so, rather
than estimating these values directly, we estimate
s
2
11
,
the by-subject variance in treatment effect. But note that
now we have two random effects for each subject s, and
these two effects can exhibit a correlation (expressed by
q
). For example, subjects who do not read carefully might
not only respond faster than the typical subject (and have a
negative S
0s
), but might also show less sensitivity to the
word type manipulation (and have a more positive S
1s
). In-
deed, such a negative correlation, where we would have
q
< 0, is suggested in our hypothetical data (Fig. 1): S1
and S3 are slow responders who show clear treatment ef-
fects, whereas S2 and S4 are fast responders who are
hardly susceptible to the word type manipulation. In the
most general case, we should not treat these effects as
coming from independent univariate distributions, but in-
stead should treat S
0s
and S
1s
as being jointly drawn from a
2
For expository purposes, we use a treatment coding scheme (0 or 1) for
the predictor variable. Alternatively, the models in this section could be
expressed in the style more common to traditional ANOVA pedagogy,
where fixed and random effects represent deviations from a grand mean.
This model can be fit by using ‘‘deviation coding’’ for the predictor variable
(.5 and .5 instead of 0 and 1). For higher-order designs, treatment and
deviation coding schemes will lead to different interpretations for lower-
order effects (simple effects for contrast coding and main effects for
deviation coding).
D.J. Barr et al. / Journal of Memory and Language 68 (2013) 255–278
259