如何用Python中的KNN算法对鸢尾花数据集进行分类,并详细展示包括计算准确率、精确率和F1值在内的性能评估指标的实现步骤?
时间: 2024-10-23 14:07:51 浏览: 197
在Python中,我们可以使用scikit-learn库中的KNeighborsClassifier来进行K近邻(KNN)算法对鸢尾花数据集的分类。以下是详细的步骤:
1. **导入所需的库**:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, f1_score, confusion_matrix
```
2. **加载鸢尾花数据集**:
```python
iris = load_iris()
X = iris.data
y = iris.target
```
3. **数据预处理和分割**:
将数据分为训练集和测试集,通常比例为70%的数据用于训练,剩余30%用于测试。
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
4. **创建并训练KNN模型**:
```python
knn = KNeighborsClassifier(n_neighbors=5) # 可调整n_neighbors值,通常是奇数
knn.fit(X_train, y_train)
```
5. **预测测试集结果**:
```python
y_pred = knn.predict(X_test)
```
6. **计算性能指标**:
- **准确率**:
```python
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```
- **精确率(Precision)**:
对于每个类别,精确率表示实际为该类别的样本中有多少被正确预测为该类别。
```python
class_labels = list(iris.target_names)
precision_values = [precision_score(y_test, y_pred, pos_label=i) for i in class_labels]
print("Precision:")
for label, value in zip(class_labels, precision_values):
print(f"Class {label}: {value:.2f}")
```
- **F1分数(F1 Score)**: F1分数是精确率和召回率的加权平均值,用于衡量分类器的整体性能。
```python
f1_scores = f1_score(y_test, y_pred, average='weighted') if len(class_labels) > 2 else f1_score(y_test, y_pred, labels=[0, 1, 2])
print("F1 Score:", f1_scores)
```
7. **混淆矩阵(Confusion Matrix)**:
显示了模型对每个类别的预测结果与真实标签的关系。
```python
conf_mat = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:\n", conf_mat)
```
阅读全文