编写一段代码,将 KNN 应用于鸢尾花数据分类任务(训练数据 75%,测试数据 25%)a)选择不同的“k”(最近邻居的数量)作为 KNN 的“1、3、5、7、...、27、29”,在一个图中可视化针对不同“k”的训练和测试准确率
时间: 2024-09-27 10:16:02 浏览: 40
在Python中,我们可以使用sklearn库来完成KNN(K-Nearest Neighbors)算法的实现,并结合交叉验证来划分数据集。以下是使用scikit-learn的`train_test_split`和`GridSearchCV`进行训练和测试,同时可视化不同k值下准确率的过程:
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# 加载鸢尾花数据
iris = load_iris()
X = iris.data
y = iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# 定义k的范围
ks = list(range(1, 30, 2)) # 从1到29,步长为2
accuracies = [] # 记录不同k下的训练和测试准确率
for k in ks:
knn = KNeighborsClassifier(n_neighbors=k)
# 使用交叉验证计算训练和测试准确率
cv_train_scores = cross_val_score(knn, X_train, y_train, cv=5, scoring='accuracy')
cv_test_scores = cross_val_score(knn, X_test, y_test, cv=5, scoring='accuracy')
mean_train_score = cv_train_scores.mean()
mean_test_score = cv_test_scores.mean()
accuracies.append((k, mean_train_score, mean_test_score))
# 可视化结果
plt.figure(figsize=(10, 6))
plt.plot(ks, [score[1] for score in accuracies], label='Training Accuracy', marker='o')
plt.plot(ks, [score[2] for score in accuracies], label='Testing Accuracy', marker='o')
plt.xlabel('Number of Neighbors (k)')
plt.ylabel('Accuracy')
plt.title('KNN Accuracy with Different k on Iris Dataset')
plt.legend()
plt.grid(True)
plt.show()
阅读全文