如何使用scikit-learn库中的算法实现鸢尾花数据集的分类任务并展示详细步骤?
时间: 2024-11-27 12:09:19 浏览: 20
决策树莺尾花-Python scikit-learn库
5星 · 资源好评率100%
在Python中,使用scikit-learn库中的算法对鸢尾花数据集(Iris dataset)进行分类任务通常包括以下几个步骤:
1. **导入所需的库**:
```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
```
2. **加载鸢尾花数据集**:
```python
iris = datasets.load_iris()
X = iris.data # 特征数据
y = iris.target # 类别标签
```
3. **数据预处理**:
- 将类别标签转换为数值(这里假设使用整数表示)
```python
y = iris.target.astype(np.int)
```
- 划分训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
- 可选的特征缩放(例如,对于KNN算法,距离度量对标准化敏感):
```python
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
4. **选择模型**:
这里我们以K近邻(KNeighborsClassifier)为例:
```python
knn_model = KNeighborsClassifier(n_neighbors=3)
```
5. **拟合模型**:
```python
knn_model.fit(X_train, y_train)
```
6. **预测**:
```python
y_pred = knn_model.predict(X_test)
```
7. **评估模型性能**:
```python
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print(classification_report(y_test, y_pred))
```
阅读全文