如何用R语言绘制学习曲线
时间: 2023-07-20 17:08:18 浏览: 43
学习曲线可以用R语言中的`ggplot2`包进行绘制。以下是一个示例代码:
```R
library(ggplot2)
# 生成随机数据
set.seed(123)
n <- 100
x <- runif(n)
y <- x^2 + rnorm(n)
# 创建训练集、验证集和测试集
train_idx <- sample(1:n, size = 70)
valid_idx <- sample(setdiff(1:n, train_idx), size = 15)
test_idx <- setdiff(1:n, c(train_idx, valid_idx))
train_data <- data.frame(x = x[train_idx], y = y[train_idx], group = "train")
valid_data <- data.frame(x = x[valid_idx], y = y[valid_idx], group = "validation")
test_data <- data.frame(x = x[test_idx], y = y[test_idx], group = "test")
# 组合数据
all_data <- rbind(train_data, valid_data, test_data)
# 定义函数,用于训练模型并返回训练误差和验证误差
train_and_validate <- function(data) {
# 拟合模型
model <- lm(y ~ x, data = data[data$group == "train", ])
# 计算训练误差
train_error <- mean((model$fitted.values - data$y[data$group == "train"])^2)
# 计算验证误差
valid_error <- mean((predict(model, newdata = data[data$group == "validation", ]) - data$y[data$group == "validation"])^2)
return(c(train_error = train_error, valid_error = valid_error))
}
# 定义不同训练集大小下的训练误差和验证误差
train_sizes <- seq(10, 70, by = 10)
errors <- t(sapply(train_sizes, function(size) {
idx <- sample(train_idx, size = size)
train_and_validate(all_data[idx, ])
}))
# 转换数据为长格式
tidy_errors <- tidyr::gather(data.frame(errors, train_size = train_sizes), key = "error_type", value = "error", -train_size)
# 绘制学习曲线
ggplot(tidy_errors, aes(x = train_size, y = error, color = error_type)) +
geom_line() +
labs(x = "Training Set Size", y = "Error", color = "") +
scale_color_manual(values = c("train_error" = "blue", "valid_error" = "red"))
```
在这个示例中,我们生成了一个模拟数据集,并将其分成训练集、验证集和测试集。然后,我们使用`train_and_validate`函数来训练模型并计算训练误差和验证误差。接下来,我们定义不同训练集大小下的训练误差和验证误差,并使用`gather`函数将其转换为长格式。最后,我们使用`ggplot2`包绘制学习曲线。