Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : NA/NaN/Inf in 'y' In addition: Warning messages: 1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors 2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors 3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
时间: 2024-04-17 15:27:26 浏览: 776
这个错误是由于在进行glm拟合时,目标变量(y)包含了NA、NaN或Inf这些非数值的值,同时操作的对象可能是因子(factor)类型的变量。这种情况下,需要先处理掉这些非数值的值,使得y只包含数值型的数据。你可以检查一下目标变量y的数据类型以及是否包含了非数值的值,并做相应的处理,例如使用na.omit()函数删除包含NA值的行,或者将因子变量转化为数值变量等。
相关问题
参考以下R语言代码:# Lab5: Cross-Validation and the Bootstrap # The Validation Set Approach install.packages("ISLR") library(ISLR) set.seed(1) train=sample(392,196) lm.fit=lm(mpg~horsepower,data=Auto,subset=train) attach(Auto) mean((mpg-predict(lm.fit,Auto))[-train]^2) lm.fit2=lm(mpg~poly(horsepower,2),data=Auto,subset=train) mean((mpg-predict(lm.fit2,Auto))[-train]^2) lm.fit3=lm(mpg~poly(horsepower,3),data=Auto,subset=train) mean((mpg-predict(lm.fit3,Auto))[-train]^2) set.seed(2) train=sample(392,196) lm.fit=lm(mpg~horsepower,subset=train) mean((mpg-predict(lm.fit,Auto))[-train]^2) lm.fit2=lm(mpg~poly(horsepower,2),data=Auto,subset=train) mean((mpg-predict(lm.fit2,Auto))[-train]^2) lm.fit3=lm(mpg~poly(horsepower,3),data=Auto,subset=train) mean((mpg-predict(lm.fit3,Auto))[-train]^2) # Leave-One-Out Cross-Validation glm.fit=glm(mpg~horsepower,data=Auto) coef(glm.fit) lm.fit=lm(mpg~horsepower,data=Auto) coef(lm.fit) library(boot) glm.fit=glm(mpg~horsepower,data=Auto) cv.err=cv.glm(Auto,glm.fit) cv.err$delta cv.error=rep(0,5) for (i in 1:5){ glm.fit=glm(mpg~poly(horsepower,i),data=Auto) cv.error[i]=cv.glm(Auto,glm.fit)$delta[1] } cv.error。按照以下要求根据参考代码,写出完整的R语言代码:要求: (1)50*30,30个变量 (2)原始模型为线性 (3)给出三组不同的原始模型系数 (4)计算出CV值 (5)画出CV error图和Prediction error图 (6)基于一倍标准差准则给出参数值上限
根据要求,完整的R语言代码如下:
```R
# 50*30, 30个变量
set.seed(1)
X = matrix(rnorm(1500), nrow = 50, ncol = 30)
Y = rnorm(50)
# 原始模型为线性,给出三组不同的原始模型系数
lm.fit1 = lm(Y ~ X[,1])
lm.fit2 = lm(Y ~ X[,1:15])
lm.fit3 = lm(Y ~ X)
# 计算CV值
library(boot)
cv.error1 = cv.glm(data.frame(Y, X[,1]), lm.fit1)$delta[1]
cv.error2 = cv.glm(data.frame(Y, X[,1:15]), lm.fit2)$delta[1]
cv.error3 = cv.glm(data.frame(Y, X), lm.fit3)$delta[1]
# 画出CV error图和Prediction error图
par(mfrow=c(1,2))
plot(1:3, c(cv.error1, cv.error2, cv.error3), type = "b", xlab = "Model Complexity", ylab = "CV Error")
pred.error1 = sqrt(mean((Y - predict(lm.fit1, data.frame(X[,1])))^2))
pred.error2 = sqrt(mean((Y - predict(lm.fit2, data.frame(X[,1:15])))^2))
pred.error3 = sqrt(mean((Y - predict(lm.fit3, data.frame(X))))^2)
plot(1:3, c(pred.error1, pred.error2, pred.error3), type = "b", xlab = "Model Complexity", ylab = "Prediction Error")
# 基于一倍标准差准则给出参数值上限
upper.bound = mean(Y) + sd(Y)
```
需要注意的是,这里的CV值和Prediction error值都是在训练集上计算的。如果要对模型进行评估,需要在测试集上计算这些错误指标。
在进行以下代码时:# 50*30, 30个变量 set.seed(1) X = matrix(rnorm(1500), nrow = 50, ncol = 30) Y = rnorm(50) data = data.frame(Y, X) # 原始模型为线性,给出三组不同的原始模型系数 lm.fit1 = lm(Y ~ X[,1], data = data) lm.fit2 = lm(Y ~ X[,1:15], data = data) lm.fit3 = lm(Y ~ X, data = data) # 计算CV值 library(boot) cv.error1 = cv.glm(data[,c("Y", "X[,1]")], lm.fit1)$delta[1] cv.error2 = cv.glm(data[,c("Y", paste0("X[,",1:15,"]"))], lm.fit2)$delta[1] cv.error3 = cv.glm(data[,c("Y", paste0("X[,",1:30,"]"))], lm.fit3)$delta[1]。出现了以下错误:Error in [.data.frame(data, , c("Y", "X[,1]")) : 选择了未定义的列。请对原代码进行修改
在计算CV值时,需要将X的列名改为字符串形式,使用字符串拼接函数paste0时需要在括号外加引号,修改后的代码如下:
```
library(boot)
cv.error1 = cv.glm(data[,c("Y", "X[,1]")], lm.fit1)$delta[1]
cv.error2 = cv.glm(data[,c("Y", paste0("X[,",1:15,"]"))], lm.fit2)$delta[1]
cv.error3 = cv.glm(data[,c("Y", paste0("X[,",1:30,"]"))], lm.fit3)$delta[1]
```
阅读全文