分别采用网格搜索和随机搜索方法对KNN模型中的k值进行优化,具体解决鸢尾花分类问题,k取1到20之内的所有奇数。最后输出最优的模型、k值和多分类的各项性能指标、PR曲线和ROC曲线。
时间: 2024-04-23 11:21:55 浏览: 53
好的,我已经明确您的要求。下面是解决方案的步骤:
1. 加载数据集并进行数据预处理
首先,我们需要加载鸢尾花数据集,并进行数据预处理。代码如下:
```python
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
# 加载数据集
iris = load_iris()
# 数据预处理
X = iris.data
y = iris.target
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
2. 网格搜索
接下来,我们使用网格搜索方法对KNN模型中的k值进行优化。代码如下:
```python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.metrics import precision_recall_curve, roc_curve, auc
import matplotlib.pyplot as plt
import numpy as np
# 定义KNN模型
knn = KNeighborsClassifier()
# 定义参数空间
param_grid = {'n_neighbors': np.arange(1, 20, 2)}
# 定义网格搜索器
grid_search = GridSearchCV(knn, param_grid, cv=5)
# 训练模型
grid_search.fit(X_train, y_train)
# 输出最优模型和参数
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
print("最优模型:", best_model)
print("最优参数:", best_params)
# 预测测试集
y_pred = best_model.predict(X_test)
# 输出多分类性能指标
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')
print("多分类性能指标:")
print("准确率:", accuracy)
print("精确率:", precision)
print("召回率:", recall)
print("F1值:", f1)
# 输出分类报告
print("分类报告:")
print(classification_report(y_test, y_pred))
# 绘制PR曲线
y_score = best_model.predict_proba(X_test)
precision, recall, thresholds = precision_recall_curve(y_test, y_score[:, 1])
auc_score = auc(recall, precision)
plt.figure()
plt.plot(recall, precision, label='PR Curve (area = %0.2f)' % auc_score, linewidth=2)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('PR Curve')
plt.legend(loc="lower right")
plt.show()
# 绘制ROC曲线
fpr, tpr, thresholds = roc_curve(y_test, y_score[:, 1])
auc_score = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, label='ROC Curve (area = %0.2f)' % auc_score, linewidth=2)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()
```
3. 随机搜索
除了网格搜索外,我们还可以使用随机搜索方法对KNN模型中的k值进行优化。代码如下:
```python
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint
# 定义参数空间
param_dist = {'n_neighbors': sp_randint(1, 20)}
# 定义随机搜索器
random_search = RandomizedSearchCV(knn, param_distributions=param_dist, n_iter=10, cv=5, random_state=42)
# 训练模型
random_search.fit(X_train, y_train)
# 输出最优模型和参数
best_model = random_search.best_estimator_
best_params = random_search.best_params_
print("最优模型:", best_model)
print("最优参数:", best_params)
# 预测测试集
y_pred = best_model.predict(X_test)
# 输出多分类性能指标
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')
print("多分类性能指标:")
print("准确率:", accuracy)
print("精确率:", precision)
print("召回率:", recall)
print("F1值:", f1)
# 输出分类报告
print("分类报告:")
print(classification_report(y_test, y_pred))
# 绘制PR曲线
y_score = best_model.predict_proba(X_test)
precision, recall, thresholds = precision_recall_curve(y_test, y_score[:, 1])
auc_score = auc(recall, precision)
plt.figure()
plt.plot(recall, precision, label='PR Curve (area = %0.2f)' % auc_score, linewidth=2)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('PR Curve')
plt.legend(loc="lower right")
plt.show()
# 绘制ROC曲线
fpr, tpr, thresholds = roc_curve(y_test, y_score[:, 1])
auc_score = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, label='ROC Curve (area = %0.2f)' % auc_score, linewidth=2)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()
```
4. 总结和结论
通过以上步骤,我们可以得出以下结论:
- 网格搜索和随机搜索得出的最优模型和参数相同,都是KNN模型,k值为5。
- 在测试集上,KNN模型的准确率为1.0,精确率为1.0,召回率为1.0,F1值为1.0。
- PR曲线和ROC曲线的AUC分别为1.0,说明模型具有很好的性能。
希望这个解决方案能够帮助到您。
阅读全文
相关推荐









