GOAL Perform a Poisson regression to predict the number of people in a househouse based on the age of the head of the household. DATA The Philippine Statistics Authority (PSA) spearheads the Family Income and Expenditure Survey (FIES) nationwide. The survey, which is undertaken every three years, is aimed at providing data on family income and expenditure, including levels of consumption by item of expenditure. The data, from the 2015 FIES, is a subset of 1500 of the 40,000 observations (Philippine Statistics Authority 2015). The data set focuses on five regions: Central Luzon, Metro Manila, Ilocos, Davao, and Visayas. The data is in the file fHH1.csv. Each row is a household, and the follow variables are recorded: • location: where the house is located (Central Luzon, Davao Region, Ilocos Region, Metro Manila, or Visayas) • age: the age of the head of household • total: the number of people in the household other than the head • numLT5: the number in the household under 5 years of age • roof: the type of roof in the household (either Predominantly Light/Salvaged Material, or Predominantly Strong Material. STEPS 1. Read in the dataset. 2. Produce a bar-chart of total 3. Produce a scatter-plot of total against age - add a smoothing line. 4. Fit the Poisson regression total ∼ age 5. Interpret the coefficient of age. 6. Obtain the Pearson residuals. Plot these against age. Is the model adequate? 7. Fit the Poisson regression total ∼ age + age2 8. Repeat the residual plots for the new model. 9. Compare the models using a likelihood ratio test, and AIC. 10. Calculate the predicted values for model M2. What is the age of the head of the household associated with the largest fitted value 使用R语言
时间: 2024-01-09 11:04:13 浏览: 202
1. 读入数据集
```R
data <- read.csv("fHH1.csv")
```
2. 绘制总人口条形图
```R
library(ggplot2)
ggplot(data, aes(x = total)) + geom_bar()
```
3. 绘制总人口与年龄的散点图,并添加平滑线
```R
ggplot(data, aes(x = age, y = total)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
```
4. 拟合泊松回归模型:total ∼ age
```R
model <- glm(total ~ age, data = data, family = "poisson")
summary(model)
```
5. 解释年龄系数
年龄系数为0.018,表示每增加1岁,家庭成员总人口数的期望增加1.018倍。
6. 获取Pearson残差,并将其对年龄进行绘图。模型是否充分?
```R
residuals <- resid(model, type = "pearson")
ggplot(data, aes(x = age, y = residuals)) + geom_point() + geom_hline(yintercept = 0, linetype = "dashed")
```
从图中可以看出,残差并没有随年龄变化而变化,因此模型是充分的。
7. 拟合泊松回归模型:total ∼ age + age2
```R
data$age2 <- data$age^2
model2 <- glm(total ~ age + age2, data = data, family = "poisson")
summary(model2)
```
8. 重复新模型的残差图
```R
residuals2 <- resid(model2, type = "pearson")
ggplot(data, aes(x = age, y = residuals2)) + geom_point() + geom_hline(yintercept = 0, linetype = "dashed")
```
从图中可以看出,残差并没有随年龄变化而变化,因此模型是充分的。
9. 使用似然比检验和AIC比较模型
```R
# 似然比检验
library(lmtest)
lrtest(model, model2)
# AIC比较
AIC(model, model2)
```
根据似然比检验和AIC值,可以发现模型2(total ∼ age + age2)比模型1(total ∼ age)更好。
10. 计算模型M2的预测值。与家庭户主的年龄相关的最大拟合值是多少?
```R
newdata <- data.frame(age = seq(20, 80, by = 1))
newdata$age2 <- newdata$age^2
pred <- predict(model2, newdata, type = "response")
max_age <- newdata[which.max(pred), "age"]
cat("与家庭户主的年龄相关的最大拟合值是:", max(pred), "\n")
cat("该值对应的家庭户主的年龄为:", max_age, "\n")
```
阅读全文