回归分析实战：研究利器

需积分: 9 66 浏览量更新于2024-07-22 收藏 6.16MB PDF 举报

"Applied Regression Analysis: A Research Tool" 是一本由John O. Rawlings, Sastry G. Pantula和David A. Dickey合著的书籍，主要关注回归分析这一统计方法，尤其适合进行自我学习。这本书在第二版中进一步深化了理论基础，提供了丰富的实例，以帮助读者更好地理解和应用回归分析。回归分析是统计学中的核心概念，主要用于研究变量之间的关系。它通过建立数学模型来描述一个或多个自变量（解释变量）与因变量（响应变量）之间的关系。在这个过程中，回归分析可以帮助我们理解自变量如何影响因变量，以及这种影响的程度。在《Applied Regression Analysis》一书中，作者们可能涵盖了以下几个关键知识点： 1. **线性回归**：最基础的回归形式，探讨自变量与因变量之间的线性关系。包括简单线性回归（单个自变量）和多元线性回归（多个自变量）。 2. **模型假设**：回归分析通常基于一些基本假设，如误差项的正态分布、同方差性和独立性等。这些假设对于选择合适的统计检验和解释结果至关重要。 3. **参数估计**：使用最小二乘法（Ordinary Least Squares, OLS）估计回归系数，这是最常用的参数估计方法。 4. **统计推断**：包括系数的显著性测试，如t检验和F检验，用于确定自变量是否对因变量有显著影响。 5. **残差分析**：检查模型拟合质量，识别异常值、多重共线性（自变量之间高度相关）和异方差性（误差项的方差随自变量变化）等问题。 6. **预测与决策**：利用回归模型对未来或未知数据进行预测，并基于预测结果做出决策。 7. **非线性回归**：当关系不是线性时，可能需要转换变量或者使用非线性模型，如多项式回归、指数回归和对数回归。 8. **回归诊断**：检查模型是否满足假设，识别并解决潜在问题，如多重共线性、异方差性和自相关性。 9. **岭回归与套索回归**：当面临多重共线性问题时，可以采用岭回归（Ridge Regression）和套索回归（Lasso Regression）等正则化技术来改善模型的稳定性和预测能力。 10. **非参数回归**：不依赖于特定函数形式的回归方法，例如局部加权回归（Locally Weighted Scatterplot Smoothing, LOWESS）和核平滑（Kernel Smoothing）。该书作为统计系列的一部分，可能还讨论了与其他统计主题的交叉，如时间序列分析、多元统计和概率理论，这些都是进行深入研究和实际应用的基础。通过学习本书，读者将能够熟练地运用回归分析解决各种科研和实践问题。

CONTENTS xvii

12.5.2 Generalized Least Squares ............... 417

12.6 Summary ............................ 426

12.7 Exercises ............................ 427

13 COLLINEARITY 433

13.1 Understanding the Structure of the X-Space ......... 435

13.2 Biased Regression ....................... 443

13.2.1 Explanation ....................... 443

13.2.2 Principal Component Regression ........... 446

13.3 General Comments on Collinearity .............. 457

13.4 Summary ............................ 459

13.5 Exercises ............................ 459

14 CASE STUDY: COLLINEARITY PROBLEMS 463

14.1 The Problem .......................... 463

14.2 Multiple Regression: Ordinary Least Squares ........ 467

14.3 Analysis of the Correlational Structure ............ 471

14.4 Principal Component Regression ............... 479

14.5 Summary ............................ 482

14.6 Exercises ............................ 483

15 MODELS NONLINEAR IN THE PARAMETERS 485

15.1 Examples of Nonlinear Models ................ 486

15.2 Fitting Models Nonlinear in the Parameters ......... 494

15.3 Inference in Nonlinear Models ................. 498

15.4 Violation of Assumptions ................... 507

15.4.1 Heteroscedastic Errors ................. 507

15.4.2 Correlated Errors .................... 509

15.5 Logistic Regression ....................... 509

15.6 Exercises ............................ 511

16 CASE STUDY: RESPONSE CURVE MODELING 515

16.1 The Ozone–Sulfur Dioxide Response Surface (1981) ..... 517

16.1.1 Polynomial Response Model .............. 520

16.1.2 Nonlinear Weibull Response Model .......... 524

16.2 Analysis of the Combined Soybean Data ........... 530

16.3 Exercises ............................ 543

17 ANALYSIS OF UNBALANCED DATA 545

17.1 Sources Of Imbalance ..................... 546

17.2 Eﬀects Of Imbalance ...................... 547

17.3 Analysis of Cell Means ..................... 549

17.4 Linear Models for Unbalanced Data ............. 553

17.4.1 Estimable Functions with Balanced Data ...... 554

17.4.2 Estimable Functions with Unbalanced Data ..... 558

2 1. REVIEW OF SIMPLE REGRESSION

stants. In addition to the Xs, all models involve unknown constants, called

parameters, which control the behavior of the model. These parameters

are denoted by Greek letters and are to be estimated from the data.

The mathematical complexity of the model and the degree to which

it is a realistic model depend on how much is known about the process

being studied and on the purpose of the modeling exercise. In preliminary

studies of a process or in cases where prediction is the primary objective,

the models usually fall into the class of models that are linear in the

parameters. That is, the parameters enter the model as simple coeﬃcients

on the independent variables or functions of the independent variables.

Such models are referred to loosely as linear models. The more realistic

models, on the other hand, are often nonlinear in the parameters. Most

growth models, for example, are nonlinear models. Nonlinear models fall

into two categories: intrinsically linear models, which can be linearized

by an appropriate transformation on the dependent variable, and those

that cannot be so transformed. Most of the discussion is devoted to the

linear class of models and to those nonlinear models that are intrinsically

linear. Nonlinear models are discussed in Section 12.2 and Chapter 15.

1.1 The Linear Model and Assumptions

The simplest linear model involves only one independent variable and states Model

that the true mean of the dependent variable changes at a constant rate

as the value of the independent variable increases or decreases. Thus, the

functional relationship between the true mean of Y

, denoted by E(Y

), and

is the equation of a straight line:

E(Y

)=β

+ β

. (1.1)

is the intercept, the value of E(Y

) when X = 0, and β

is the slope of

the line, the rate of change in E(Y

) per unit change in X.

The observations on the dependent variable Y

are assumed to be random Assumptions

observations from populations of random variables with the mean of each

population given by E(Y

). The deviation of an observation Y

from its

population mean E(Y

) is taken into account by adding a random error 

to give the statistical model

= β

+ β

+ 

. (1.2)

The subscript i indicates the particular observational unit, i =1, 2,...,n.

The X

are the n observations on the independent variable and are assumed

to be measured without error. That is, the observed values of X are assumed

to be a set of known constants. The Y

and X

are paired observations; both

are measured on every observational unit.

1.2 Least Squares Estimation 3

The random errors 

have zero mean and are assumed to have common

variance σ

and to be pairwise independent. Since the only random element

in the model is 

, these assumptions imply that the Y

also have common

variance σ

and are pairwise independent. For purposes of making tests

of signiﬁcance, the random errors are assumed to be normally distributed,

which implies that the Y

are also normally distributed. The random error

assumptions are frequently stated as



∼ NID(0,σ

), (1.3)

where NID stands for “normally and independently distributed.” The quan-

tities in parentheses denote the mean and the variance, respectively, of the

normal distribution.

1.2 Least Squares Estimation

The simple linear model has two parameters β

and β

, which are to be

estimated from the data. If there were no random error in Y

, any two data

points could be used to solve explicitly for the values of the parameters.

The random variation in Y , however, causes each pair of observed data

points to give diﬀerent results. (All estimates would be identical only if the

observed data fell exactly on the straight line.) A method is needed that

will combine all the information to give one solution which is “best” by

some criterion.

The least squares estimation procedure uses the criterion that the Least Squares

Criterionsolution must give the smallest possible sum of squared deviations of the

observed Y

from the estimates of their true means provided by the solu-

tion. Let



and



be numerical estimates of the parameters β

and β

respectively, and let



(1.4)

be the estimated mean of Y for each X

, i =1,...,n. Note that



is ob-

tained by substituting the estimates for the parameters in the functional

form of the model relating E(Y

)toX

, equation 1.1. The least squares prin-

ciple chooses



and



that minimize the sum of squares of the residuals,

SS(Res):

SS(Res)=



i=1

−



)



, (1.5)

where e

=(Y

−



) is the observed residual for the ith observation. The

summation indicated by



is over all observations in the data set as indi-

剩余670页未读，继续阅读

miniflyingpiggy

粉丝: 0
资源: 3

回归分析实战：研究利器

Linear Regression Analysis: Applications and Assumptions

'LogisticRegression':{'C': (0.01, 10),'penalty': ['l1', 'l2'], solver ='liblinear'}这段代码存在什么问题你

写一段MPlus代码并解释

常见的回归模型有哪些并介绍一下

from sklearn.linear_model import LinearRegression ModuleNotFoundError: No module named 'sklearn'

机器学习回归拟合算法有哪些

数学建模回归模型有哪些

python 回归分析算法

最新资源