Bootstrap方法与Resampling在统计分析中的应用

需积分: 9 45 浏览量更新于2024-07-26 收藏 1.18MB PDF 举报

"R语言中的Bootstrap方法教程" Bootstrap是一种在统计学中用于估计样本分布特性的重采样方法，尤其适用于处理小样本或复杂数据结构的情况。它利用计算机生成大量从原始数据集复制（或“抽样”）的子样本，通过对这些子样本进行分析来推断总体参数的分布特性，进而估计标准误差、校正偏差和构建置信区间。 1. **介绍**： Bootstrap方法的动机在于传统的统计理论往往依赖于大样本和特定的分布假设，而Bootstrap提供了一种更加灵活的处理方式。它通过从原始数据中随机抽取样本（可以有放回地抽取），构建出多个“伪总体”，以此模拟不同的抽样情况。 2. **参数、分布与插值原则**：在统计中，参数是对总体特征的度量，如均值、方差等。Bootstrap方法可以帮助我们估计这些参数的分布，而不是仅仅得到一个点估计。插值原则是指用估计量代替未知的参数，Bootstrap就是这一原则的实际应用。 3. **估计标准误差**： Bootstrap算法通过计算不同子样本的统计量分布来估计标准误差。例如，在对法律学院数据进行分析时，Bootstrap可以用来估计某个估计量的标准误差，并确定其稳定性。 4. **Bootstrap样本数量的选择**：确定需要抽取多少个Bootstrap样本是个关键问题。通常，样本数量越多，估计结果越准确，但计算成本也越高。实践中，几百到几千个样本通常是足够的。 5. **参数化Bootstrap**：当总体分布已知或可以近似为特定形式时，可以使用参数化Bootstrap。这种方法涉及到在已知分布下重新抽样，以便更准确地模拟总体。 6. **失败的情况**： Bootstrap方法并非在所有情况下都适用，当原始数据存在依赖性、异方差性或非正态性时，Bootstrap可能无法给出准确的估计。 7. **更复杂的数据结构**：对于具有嵌套结构（如分类变量）的数据或时间序列数据，Bootstrap需要相应的适应性策略。例如，在回归分析中，可以Bootstrap配对数据或残差；对于时间序列，可以使用移动块Bootstrap来处理序列相关性。 8. **偏倚估计**： Bootstrap不仅可以估计标准误差，还可以帮助识别和校正估计量的偏倚。通过比较原始估计和Bootstrap样本的平均估计，可以估计偏倚并寻找修正方法。 9. **杰克knife法**：杰克knife是另一种减少偏倚的方法，它通过删除一个观测值然后进行分析，重复此过程以得到一系列估计，从而估计偏倚和方差。 10. **置信区间**： Bootstrap可以生成精确的置信区间，包括基于参数估计的标准误差的正常近似以及Bootstrap-t方法。Bootstrap-t方法特别适用于小样本或非正态分布的情况，它考虑了估计标准误差的分布，提供了更稳健的区间估计。 Bootstrap方法在R语言中有着丰富的实现，例如`boot`包提供了多种Bootstrap方法的实现，使得研究者和数据分析师能够方便地应用Bootstrap来解决实际问题。在R中，Bootstrap不仅限于以上介绍的基本概念，还包括了对各种复杂模型和数据类型的扩展应用。通过熟练掌握Bootstrap，可以在面对不确定性时做出更准确、更稳健的统计推断。

Resampling — 3 ESTIMATING STANDARD ERRORS [ 22nd November 2010, 21:37 ] — 4

Example: the law school data

par(mar = c(4, 4, 0, 0))

with(law82, plot(100 * GPA ~ LSAT, ylab = "GPA"))

points(law, pch = 3)

legend("bottomright", c("population", "sample"), pch = c(1,

3))

500 550 600 650 700

260 280 300 320 340

LSAT

GPA

population

sample

Our population are 82 law schools (data set law82).

Our sample (law) only contains 15 observations.

We are interested in the

correlation of GPA (Grade Point

Average, unde rgraduate score) and LSAT ( Law School Ad-

mission Test Score).

(i.e. the θ and the t (·) we are talking about is the correl-

ation.)

The true score:

with(law82, cor(GPA, LSAT))

[1] 0.7599979

The p lug-in estimate:

with(law, cor(GPA, LSAT))

[1] 0.7763745

Convergence Why does it make sense to use a plug-in

estimator?

Glivenko-Cantelli’s theorem (1933)

sup

(x) − F(x)|

n→∞

→ 0 almost surely

if t(

F) is continuous then

)

n→∞

→ t(F) = θ almost surely

We can use the bootstrap to measure. . .

• . . . the b ias of the plug-in estimate

• . . . the standard error of the plug-in estimate

3 Estimating standard errors

3.1 The bootstrap algorithm for standard er-

rors

• x

, x

, . . . , x

are n

observed values.

•

F is the

empirical distribution (with probability 1/n

on ea c h of x

, x

, . . . , x

• x

∗

= (x

∗

, x

∗

, . . . , x

∗

) is a bootstrap sample, drawn

from

∗

= (x

∗

, x

∗

, . . . , x

∗

) is a random sample of size n

drawn with replace ment from the observed values

, x

, . . . , x

•

∗

= s(x

∗

) is a

bootstrap replication of

θ.

• se

(

θ) is the standard error of

θ (we usually can not

calculate se

(

θ)).

• se

(

∗

) is the (ideal)

bootstrap e stimate of se

(

θ).

(

∗

) can be ha rd to calculate.

We can, however, get a good approximation of

(

∗

The bootstrap algorithm for standard errors

1. Draw B independent bootstrap samples

∗1

, x

∗2

, . . . , x

∗B

Each sa mple has size n and is drawn with replace-

ment from x

, x

, . . . , x

2. For bootstrap sample b = 1, 2, . . . , B dete rmine

∗b

s(x

∗b

s,B S

= se

b=1

(

∗b

) =

B−1

∑

b=1



∗b

−

∑

b=1

∗b



lim

B→∞

s,B S

= se

(

∗

)

{z }

ideal bootstrap

F →

∗1

→ s(x

∗1

) =

∗1

∗2

→ s(x

∗2

) =

∗2

∗b

→ s(x

∗b

) =

∗b

∗B

→ s(x

∗B

) =

∗B











→ se

b=1

(

∗b

) =

s,B S

剩余20页未读，继续阅读

木瓜牛奶

粉丝: 0

Bootstrap方法与Resampling在统计分析中的应用

R语言教程英文版：R Installation and Administration.pdf

R语言学习资料,中文,英文

R软件 中英文教程

R英文视频教程.txt

R语言统计计算与数据分析英文教程

R语言函数公式命令大全：详细英文教程

R语言初学者指南：英文PDF教程

权威的R语言入门教程《R导论》

R语言使用教程.rar

非常详细的R语言教程

最新资源

R软件中英文教程