《混合线性模型实践指南》：软件应用详解

需积分: 16 168 浏览量更新于2024-07-21 2 收藏 10.7MB PDF 举报

《混合线性模型：统计软件实践指南》是一本由Brady T. West、Kathleen B. Welch、Andrzej T. Gal、/ecki合著的权威著作，于2007年由Taylor & Francis Group出版，隶属于Chapman & Hall/CRC系列。本书旨在为读者提供一个实用的框架，以理解并应用混合模型在统计分析中的关键作用，特别关注SAS、SPSS和R等主流统计软件的操作技巧。混合线性模型（Mixed Linear Models, MLM）是一种强大的统计工具，它结合了固定效应（Fixed Effects）和随机效应（Random Effects）的概念，适用于处理具有群组间差异的数据，常见于社会科学、医学研究、教育学、心理学等多个领域。群组间的差异可能是由于个体间的异质性或观察单位之间的内在关联，混合模型能够更好地控制这些因素，从而提高估计精度和显著性检验的效力。该书详细介绍了如何在实际场景中建立和估计混合模型，包括模型设定、参数估计、假设检验以及模型诊断等内容。作者们不仅阐述理论概念，还提供了丰富的实例和案例，使读者能够通过具体操作掌握各种软件的使用方法。此外，书中特别强调了跨软件的比较，这对于在不同软件环境中工作的分析师来说是一个宝贵的资源。版权方面，本书基于美国政府作品，使用了经许可的摘录和引用材料，确保了信息的可靠性和版权合规。尽管作者和出版商已经尽了合理努力提供准确数据，但读者在使用书中的信息时仍需自行判断其适用性和准确性，并承担相应责任。《混合线性模型：统计软件实践指南》是一本不可或缺的参考资料，对于希望深入理解和运用混合模型进行数据分析的专业人士，无论是初学者还是经验丰富的统计学家，都将从中受益匪浅。通过阅读这本书，读者将能够提升自己的统计分析技能，并能在实际工作中熟练运用各种统计软件来解决复杂的混合模型问题。

2 Linear Mixed Models: A Practical Guide Using Statistical Software

With this book, we illustrate (1) a heuristic development of LMMs based on both general

and hierarchical model specifications, (2) the step-by-step development of the model-

building process, and (3) the estimation, testing, and interpretation of both fixed-effect

parameters and covariance parameters associated with random effects. We work through

examples of analyses of real data sets, using procedures designed specifically for the fitting

of LMMs in SAS, SPSS, R, Stata, and HLM. We compare output from fitted models across

the software procedures, address the similarities and differences, and give an overview

of the options and features available in each procedure.

1.1.1 Models with Random Effects for Clustered Data

Clustered data arise when observations are made on subjects within the same randomly

selected group. For example, data might be collected from students within the same

classroom, patients in the same clinic, or rat pups in the same litter. These designs involve

units of analysis nested within clusters. If the clusters can be considered to have been

sampled from a larger population of clusters, their effects can be modeled as random

effects in an LMM. In a designed experiment with blocking, such as a randomized block

design, the blocks are crossed with treatments, meaning that each treatment occurs once

in each block. Block effects are usually considered to be random. We could also think of

blocks as clusters, with treatment as a within-cluster covariate.

LMMs allow for the inclusion of both individual-level covariates (such as age and sex)

and cluster-level covariates (such as cluster size), while adjusting for random effects

associated with each cluster. Although individual cluster-specific coefficients are not

explicitly estimated, most LMM software produces cluster-specific “predictions” (EBLUPs,

or empirical best linear unbiased predictors) of the random cluster-specific effects. Esti-

mates of the variability of the random effects associated with clusters can then be obtained,

and inferences about the variability of these random effects in a greater population of

clusters can be made.

Note that traditional approaches to analysis of variance (ANOVA) models with both

fixed and random effects used expected mean squares to determine the appropriate

denominator for each F-test. Readers who learned mixed models under the expected mean

squares system will begin the study of LMMs with valuable intuition about model building,

although expected mean squares per se are now rarely mentioned.

We examine a two-level model with random cluster-specific intercepts for a two-level

clustered data set in Chapter 3 (the Rat Pup data). We then consider a three-level model

for data from a study with students nested within classrooms and classrooms nested

within schools in Chapter 4 (the Classroom data).

1.1.2 Models for Longitudinal or Repeated-Measures Data

Longitudinal data arise when multiple observations are made on the same subject or unit

of analysis over time. Repeated-measures data may involve measurements made on the

same unit over time, or under changing experimental or observational conditions. Mea-

surements made on the same variable for the same subject are likely to be correlated (e.g.,

measurements of body weight for a given subject will tend to be similar over time). Models

fitted to longitudinal or repeated-measures data involve the estimation of covariance

parameters to capture this correlation.

The software procedures (e.g., the GLM procedures in SAS and SPSS) that were available

for fitting models to longitudinal and repeated-measures data prior to the advent of

software for fitting LMMs accommodated only a limited range of models. These traditional

C4800_C001.fm Page 2 Tuesday, September 26, 2006 10:58 AM

Introduction 3

repeated-measures ANOVA models assumed a multivariate normal (MVN) distribution

of the repeated measures and required either estimation of all covariance parameters of

the MVN distribution or an assumption of “sphericity” of the covariance matrix (with

corrections such as those proposed by Geisser and Greenhouse (1958) or Huynh and Feldt

(1976) to provide approximate adjustments to the test statistics to correct for violations

of this assumption). In contrast, LMM software, although assuming the MVN

distribution of the repeated measures, allows users to fit models with a broad selection

of parsimonious covariance structures, offering greater efficiency than estimating the full

variance-covariance structure of the MVN model, and more flexibility than models assum-

ing sphericity. Some of these covariance structures may satisfy sphericity (e.g., indepen-

dence or compound symmetry), and other structures may not (e.g., autoregressive or

various types of heterogeneous covariance structures). The LMM software procedures

considered in this book allow varying degrees of flexibility in fitting and testing covariance

structures for repeated-measures or longitudinal data.

Software for LMMs has other advantages over software procedures capable of fitting

traditional repeated-measures ANOVA models. First, LMM software procedures allow

subjects to have missing time points. In contrast, software for traditional repeated-

measures ANOVA drops an entire subject from the analysis if the subject has missing

data for a single time point (known as complete-case analysis; see Little and Rubin, 2002).

Second, LMMs allow for the inclusion of time-varying covariates in the model (in addition

to a covariate representing time), whereas software for traditional repeated-measures

ANOVA does not. Finally, LMMs provide tools for the situation in which the trajectory

of the outcome varies over time from one subject to another. Examples of such models

include growth curve models, which can be used to make inference about the variability

of growth curves in the larger population of subjects. Growth curve models are examples

of random coefficient models (or Laird–Ware models), which will be discussed when

considering the longitudinal data in Chapter 6 (the Autism data).

In Chapter 5, we consider LMMs for a small repeated-measures data set with two within-

subject factors (the Rat Brain data). We consider models for a data set with features of

both clustered and longitudinal data in Chapter 7 (the Dental Veneer data).

1.1.3 The Purpose of this Book

This book is designed to help applied researchers and statisticians use LMMs appropri-

ately for their data analysis problems, employing procedures available in the SAS, SPSS,

Stata, R, and HLM software packages. It has been our experience that examples are the

best teachers when learning about LMMs. By illustrating analyses of real data sets using

the different software procedures, we demonstrate the practice of fitting LMMs and

highlight the similarities and differences in the software procedures.

We present a heuristic treatment of the basic concepts underlying LMMs in Chapter 2.

We believe that a clear understanding of these concepts is fundamental to formulating an

appropriate analysis strategy. We assume that readers have a general familiarity with

ordinary linear regression and ANOVA models, both of which fall under the heading of

general (or standard) linear models. We also assume that readers have a basic working

knowledge of matrix algebra, particularly for the presentation in Chapter 2.

Nonlinear mixed models and generalized LMMs (in which the dependent variable may

be a binary, ordinal, or count variable) are beyond the scope of this book. For a discussion

of nonlinear mixed models, see Davidian and Giltinan (1995), and for references on

generalized LMMs, see Diggle et al. (2002) or Molenberghs and Verbeke (2005). We also

C4800_C001.fm Page 3 Tuesday, September 26, 2006 10:58 AM

4 Linear Mixed Models: A Practical Guide Using Statistical Software

do not consider spatial correlation structures; for more information on spatial data anal-

ysis, see Gregoire et al. (1997).

This book should not be substituted for the manuals of any of the software packages

discussed. Although we present aspects of the LMM procedures available in each of the

five software packages, we do not present an exhaustive coverage of all available options.

1.1.4 Outline of Book Contents

Chapter 2 presents the notation and basic concepts behind LMMs and is strongly recom-

mended for readers whose aim is to understand these models. The remaining chapters

are dedicated to case studies, illustrating some of the more common types of LMM

analyses with real data sets, most of which we have encountered in our work as statistical

consultants. Each chapter presenting a case study describes how to perform the analysis

using each software procedure, highlighting features in one of the statistical software

packages in particular.

In Chapter 3, we begin with an illustration of fitting an LMM to a simple two-level

clustered data set and emphasize the SAS software. Chapter 3 presents the most detailed

coverage of setting up the analyses in each software procedure; subsequent chapters do

not provide as much detail when discussing the syntax and options for each procedure.

Chapter 4 introduces models for three-level data sets and illustrates the estimation of

variance components associated with nested random effects. We focus on the HLM soft-

ware in Chapter 4. Chapter 5 illustrates an LMM for repeated-measures data arising from

a randomized block design, focusing on the SPSS software. Examples in this book were

constructed using SPSS Version 13.0, and all SPSS syntax presented also works in SPSS

Version 14.0.

Chapter 6 illustrates the fitting of a random coefficient model (specifically, a growth curve

model), and emphasizes the R software. Regarding the R software, the examples have been

constructed using the lme() function, which is available in the nlme package. Recent

developments have resulted in the availability of the lmer() function in the lme4 package,

which is considered by the developers to be an improvement over the lme() function.

Relative to the lme() function, the lmer() function offers improved estimation of LMMs

with crossed random effects and also allows for fitting generalized LMMs to non-normal

outcomes. We do not consider examples of these types, but the analyses presented have

been duplicated as much as possible using the lmer() function on the book Web page (see

Appendix A). Finally, Chapter 7 combines many of the concepts introduced in the earlier

chapters by introducing a model with both random effects and correlated residuals, and

highlights the Stata software.

The analyses of examples in Chapter 3, Chapter 5, and Chapter 7 all consider alternative,

heterogeneous covariance structures for the residuals, which is a very important feature

of LMMs that makes them much more flexible than alternative linear modeling tools. At

the end of each chapter presenting a case study, we consider the similarities and differences

in the results generated by the software procedures. We discuss reasons for any discrep-

ancies, and make recommendations for use of the various procedures in different settings.

Appendix A presents several statistical software resources. Information on the back-

ground and availability of the statistical software packages SAS (Version 9.1), SPSS

(Version 13.0.1), Stata (Release 9), R (Version 2.2.1), and HLM (Version 6) is provided in

addition to links to other useful mixed modeling resources, including Web sites for

important materials from this book. Appendix B revisits the Rat Brain analysis from

Chapter 5 to illustrate the calculation of the marginal variance-covariance matrix implied

by one of the LMMs considered in that chapter. This appendix is designed to provide

C4800_C001.fm Page 4 Tuesday, September 26, 2006 10:58 AM

Introduction 5

readers with a detailed idea of how one models the covariance of dependent observations

in clustered or longitudinal data sets. Finally, Appendix C presents some commonly used

abbreviations and acronyms associated with LMMs.

1.2 A Brief History of LMMs

Some historical perspective on this topic is useful. At the very least, when LMMs seem

difficult to grasp, it is comforting to know that scores of people have spent over a hundred

years sorting it all out. The following subsections highlight many (but not nearly all) of

the important historical developments that have led to the widespread use of LMMs today.

We divide the key historical developments into two categories: theory and software. Some

of the terms and concepts introduced in this timeline will be discussed in more detail later

in the book.

1.2.1 Key Theoretical Developments

The following timeline presents the evolution of the theoretical basis of LMMs:

1861: The first known formulation of a one-way random-effects model (an LMM with

one random factor and no fixed factors) is that by Airy, which was further clarified

by Scheffé in 1956. Airy made several telescopic observations on the same night

(clustered data) for several different nights and analyzed the data separating the

variance of the random night effects from the random within-night residuals.

1863: Chauvenet calculated variances of random effects in a simple random-effects

model.

1925: Fisher’s book Statistical Methods for Research Workers outlined the general method

for estimating variance components, or partitioning random variation into com-

ponents from different sources, for balanced data.

1927: Yule assumed explicit dependence of the current residual on a limited number

of the preceding residuals in building pure serial correlation models.

1931: Tippett extended Fisher’s work into the linear model framework, modeling

quantities as a linear function of random variations due to multiple random

factors. He also clarified an ANOVA method of estimating the variances of ran-

dom effects.

1935: Neyman, Iwaszkiewicz, and Kolodziejczyk examined the comparative efficien-

cy of randomized blocks and Latin squares designs and made extensive use of

LMMs in their work.

1938: The seventh edition of Fisher’s 1925 work discusses estimation of the intraclass

correlation coefficient (ICC).

1939: Jackson assumed normality for random effects and residuals in his description

of an LMM with one random factor and one fixed factor. This work introduced

the term effect in the context of LMMs. Cochran presented a one-way random-

effects model for unbalanced data.

1940: Winsor and Clarke, and also Yates, focused on estimating variances of random

effects in the case of unbalanced data. Wald considered confidence intervals for

C4800_C001.fm Page 5 Tuesday, September 26, 2006 10:58 AM

6 Linear Mixed Models: A Practical Guide Using Statistical Software

ratios of variance components. At this point, estimates of variance components

were still not unique.

1941: Ganguli applied ANOVA estimation of variance components associated with

random effects to nested mixed models.

1946: Crump applied ANOVA estimation to mixed models with interactions. Ganguli

and Crump were the first to mention the problem that ANOVA estimation can

produce negative estimates of variance components associated with random

effects. Satterthwaite worked with approximate sampling distributions of variance

component estimates and defined a procedure for calculating approximate de-

grees of freedom for approximate

-statistics in mixed models.

1947: Eisenhart introduced the “mixed model” terminology and formally distin-

guished between fixed- and random-effects models.

1950: Henderson provided the equations to which the BLUPs of random effects and

fixed effects were the solutions, known as the mixed model equations (MMEs).

1952: Anderson and Bancroft published Statistical Theory in Research, a book providing

a thorough coverage of the estimation of variance components from balanced data

and introducing the analysis of unbalanced data in nested random-effects models.

1953: Henderson produced the seminal paper “Estimation of Variance and Covariance

Components” in Biometrics, focusing on the use of one of three sums of squares

methods in the estimation of variance components from unbalanced data in mixed

models (the Type III method is frequently used, being based on a linear model,

but all types are available in statistical software packages). Various other papers

in the late 1950s and 1960s built on these three methods for different mixed models.

1965: Rao was responsible for the systematic development of the growth curve model,

a model with a common linear time trend for all units and unit-specific random

intercepts and random slopes.

1967: Hartley and Rao showed that unique estimates of variance components could

be obtained using maximum likelihood methods, using the equations resulting

from the matrix representation of a mixed model (Searle et al., 1992). However,

the estimates of the variance components were biased downward because this

method assumes that fixed effects are known and not estimated from data.

1968: Townsend was the first to look at finding minimum variance quadratic unbiased

estimators of variance components.

1971: Restricted maximum likelihood (REML) estimation was introduced by Patterson

and Thompson as a method of estimating variance components (without assuming

that fixed effects are known) in a general linear model with unbalanced data.

Likelihood-based methods developed slowly because they were computationally

intensive. Searle described confidence intervals for estimated variance compo-

nents in an LMM with one random factor.

1972: Gabriel developed the terminology of ante-dependence of order p to describe a

model in which the conditional distribution of the current residual, given its

predecessors, depends only on its p predecessors. This leads to the development

of the first-order autoregressive [AR(1)] process (appropriate for equally spaced

measurements on an individual over time), in which the current residual depends

stochastically on the previous residual. Rao completed work on minimum-norm

quadratic unbiased equation (MINQUE) estimators, which demand no distribu-

tional form for the random effects or residual terms. Lindley and Smith introduced

HLMs.

C4800_C001.fm Page 6 Tuesday, September 26, 2006 10:58 AM

剩余348页未读，继续阅读

eryachen1965101

粉丝: 1
资源: 11

《混合线性模型实践指南》：软件应用详解

混合动力模型

sas for mixed models

Linear Mixed Models

generalized linear mixed model

Generalized Linear Mixed Model 教材

A Neural Dynamics and Oscillator Interference Mixed Model of The Grid Cells for Spatial Recognition

Mixed Model Universal Software Thread-Level Speculation (ICCP2013)-计算机科学

3. Mixed model examples solved (Windows):R解决了各种X，Z和Var（U）的Y = X.B + Zu + E的示例-开源

R语言潜类别混合效应模型(LATENT CLASS MIXED MODEL ,LCMM)分析老年痴呆年龄数据

mixed effect model.rar_mixed

最新资源