使用KNN分类算法实现鸢尾花数据集的分类
时间: 2024-11-12 19:32:40 浏览: 4
使用KNN(K-Nearest Neighbors,K近邻)算法对鸢尾花数据集进行分类的一般步骤如下:
### 步骤1: 导入所需库(5分)
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
```
### 步骤2: 加载数据(5分)
```python
iris = load_iris() # 鸢尾花数据集默认包含在sklearn.datasets模块中
df = pd.DataFrame(data=np.c_[iris['data'], iris['target']], columns=iris['feature_names'] + ['target'])
```
### 步骤3: 数据预处理(15分)
1. **拆分特征和目标变量**
```python
X = df.drop('target', axis=1)
y = df['target']
```
2. **归一化特征**
```python
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```
### 步骤4: 划分训练集和测试集(10分)
```python
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
```
### 步骤5: 创建并训练KNN分类器(10分)
```python
knn = KNeighborsClassifier(n_neighbors=3) # 选择合适的K值
knn.fit(X_train, y_train)
```
### 步骤6: 进行预测(5分)
```python
y_pred = knn.predict(X_test)
```
### 步骤7: 评估性能(10分)
```python
accuracy = accuracy_score(y_test, y_pred)
conf_mat = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)
results = {
'Accuracy': accuracy,
'Confusion Matrix': conf_mat.tolist(),
'Classification Report': report
}
```
### 步骤8: 输出结果(5分)
```python
pd.DataFrame(results).to_csv('knn_iris_results.csv', index=False)
```
现在,我们已经完成了KNN在鸢尾花数据集上的分类,并将结果保存到了CSV文件中。**相关问题--:**
1. 影响KNN性能的因素有哪些?
2. 如何可视化混淆矩阵?
3. 能否解释一下KNN中的“距离”是如何计算的?
阅读全文