利用scikit-learn、Python语言编程实现K近邻算法.评估算法运行结果,并进行优化,可视化显示
时间: 2024-12-23 13:15:19 浏览: 5
在Python中,利用scikit-learn库可以轻松地实现K近邻(K-Nearest Neighbors, KNN)算法。下面是基本步骤:
1. **导入所需库**:
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
```
2. **加载数据集**(这里假设用的是iris数据集为例):
```python
iris = datasets.load_iris()
X = iris.data
y = iris.target
```
3. **数据预处理**,分割训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
4. **创建并训练KNN分类器**,例如选择k=3作为邻居数:
```python
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
```
5. **预测测试集结果**:
```python
y_pred = knn.predict(X_test)
```
6. **评估模型性能**:
```python
accuracy = accuracy_score(y_test, y_pred)
conf_mat = confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_mat)
```
7. **优化和调整参数**(如尝试不同的k值,网格搜索法等):
```python
from sklearn.model_selection import GridSearchCV
param_grid = {'n_neighbors': range(1, 31)}
grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_k = grid_search.best_params_['n_neighbors']
print("Best k:", best_k)
```
8. **可视化结果**,比如决策边界(仅适用于二分类问题):
```python
if y_train.shape[1] == 2: # 假设是二分类问题
h = .02 # step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = grid_search.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title(f'Optimized KNN with k={best_k}')
plt.show()
```
阅读全文