用R语言实现印第安人糖尿病的KNN分类并画图考察其训练误差与交叉验证,计算预测准确率
时间: 2024-05-10 17:15:22 浏览: 5
好的,以下是R语言实现印第安人糖尿病的KNN分类并画图考察其训练误差与交叉验证,计算预测准确率的代码:
首先,我们需要导入数据集:
```R
data <- read.csv("diabetes.csv")
```
接下来,我们将数据集分成训练集和测试集:
```R
library(caTools)
set.seed(123)
split <- sample.split(data$Outcome, SplitRatio = 0.7)
train <- subset(data, split == TRUE)
test <- subset(data, split == FALSE)
```
然后,我们需要对数据进行归一化处理:
```R
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
train_norm <- as.data.frame(lapply(train[, -9], normalize))
test_norm <- as.data.frame(lapply(test[, -9], normalize))
train_norm$Outcome <- train$Outcome
test_norm$Outcome <- test$Outcome
```
接下来,我们将使用knn函数来训练模型并进行预测:
```R
library(class)
k <- 5
pred <- knn(train_norm[, -9], test_norm[, -9], train_norm$Outcome, k)
```
然后,我们将计算预测准确率:
```R
accuracy <- sum(pred == test_norm$Outcome) / nrow(test_norm)
print(paste("Accuracy:", accuracy))
```
最后,我们将使用ggplot2库来画出训练误差和交叉验证误差的图:
```R
library(ggplot2)
library(gridExtra)
train_error <- data.frame(k = 1:20, error = rep(0, 20))
cv_error <- data.frame(k = 1:20, error = rep(0, 20))
for (i in 1:20) {
pred_train <- knn(train_norm[, -9], train_norm[, -9], train_norm$Outcome, i)
pred_cv <- knn(train_norm[, -9], test_norm[, -9], train_norm$Outcome, i)
train_error[i, 2] <- 1 - sum(pred_train == train_norm$Outcome) / nrow(train_norm)
cv_error[i, 2] <- 1 - sum(pred_cv == test_norm$Outcome) / nrow(test_norm)
}
train_plot <- ggplot(train_error, aes(x = k, y = error)) + geom_line() + ggtitle("Training Error")
cv_plot <- ggplot(cv_error, aes(x = k, y = error)) + geom_line() + ggtitle("Cross-validation Error")
grid.arrange(train_plot, cv_plot, ncol = 2)
```
这样,我们就完成了印第安人糖尿病的KNN分类,并且画出了训练误差和交叉验证误差的图。