使用Scikit-learn进行KNN分类,数据集使用iris数据集,训练集占数据总量的70%,预测测试集样本类别及准确率,调整K值、random_state值比较准确率的变化。
时间: 2024-12-15 11:22:31 浏览: 8
在Python中,我们可以使用scikit-learn库中的KNeighborsClassifier模块来进行K近邻(KNN)分类。首先,你需要导入必要的库并加载Iris数据集,这是一个经典的多类分类问题。
```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# 加载 iris 数据集
iris = load_iris()
X = iris.data
y = iris.target
# 划分数据集为训练集和测试集,其中训练集占比70%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 创建 KNN 分类器,默认 K 值为5
knn = KNeighborsClassifier()
# 训练模型
knn.fit(X_train, y_train)
# 预测测试集
y_pred = knn.predict(X_test)
# 计算初始准确率
accuracy = accuracy_score(y_test, y_pred)
print(f"初始K={knn.n_neighbors}, 预测准确率: {accuracy*100}%")
# 调整 K 值
k_values = [3, 5, 7, 9] # 可以尝试不同的 K 值
accuracies = []
for k in k_values:
knn.n_neighbors = k
y_pred_k = knn.predict(X_test)
accuracies.append(accuracy_score(y_test, y_pred_k))
# 输出不同 K 值下的准确率
print("K值变化对准确率的影响:")
for k, acc in zip(k_values, accuracies):
print(f"K={k}, 准确率: {acc*100}%")
# 调整 random_state 值
random_states = [42, 1, 43]
accuracies_random_state = {}
for state in random_states:
knn.random_state = state
y_pred_rnd = knn.predict(X_test)
accuracies_random_state[state] = accuracy_score(y_test, y_pred_rnd)
print("random_state变化对准确率的影响:")
for state, acc in accuracies_random_state.items():
print(f"random_state={state}, 准确率: {acc*100}%")
```
在这个例子中,我们首先训练了模型,并计算了初始的准确率。然后,我们改变了K值,观察模型性能的变化。接着,我们也调整了random_state以评估随机种子对结果的影响。每次改变之后,都会计算新的预测准确率,并展示出来。
阅读全文