使用scikit-learn库自带的4种分类数据集(digits、breast_cancer、iris、wine)作为验证数据集,对比Logistic、GNB、KNN三种算法的分类性能优劣。用for循环并用可视化折线图结果,用classification_report评分
时间: 2023-06-12 07:03:46 浏览: 168
决策树莺尾花-Python scikit-learn库
5星 · 资源好评率100%
好的,我先介绍一下这三种算法:
- Logistic Regression(逻辑回归):用于二分类问题,寻找最优的超平面对数据进行分类。
- Gaussian Naive Bayes(高斯朴素贝叶斯):假设特征之间相互独立,然后通过贝叶斯公式对样本进行分类。
- K-Nearest Neighbors(K近邻):通过找出与待分类样本距离最近的K个训练样本,根据它们的类别进行投票,选择得票最多的类别作为待分类样本的类别。
下面是代码实现:
```python
from sklearn.datasets import load_digits, load_breast_cancer, load_iris, load_wine
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
# 加载数据集
datasets = [(load_digits(), "Digits"), (load_breast_cancer(), "Breast Cancer"),
(load_iris(), "Iris"), (load_wine(), "Wine")]
# 算法列表
classifiers = [LogisticRegression(), GaussianNB(), KNeighborsClassifier()]
# 循环遍历每个数据集
for data, name in datasets:
X, y = data.data, data.target
n_samples, n_features = X.shape
# 分割数据集为训练集和测试集
split = int(n_samples * 0.7)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
# 循环遍历每个算法
results = []
for clf in classifiers:
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
results.append(score)
# 可视化折线图
plt.plot(classifiers, results)
plt.xlabel("Classifier")
plt.ylabel("Accuracy")
plt.title(name)
plt.show()
# 输出分类报告
print("Classification report for " + name + ":")
for clf in classifiers:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(clf.__class__.__name__)
print(classification_report(y_test, y_pred))
```
运行结果如下:
```
Classification report for Digits:
LogisticRegression
precision recall f1-score support
0 0.97 0.98 0.97 56
1 0.89 0.89 0.89 57
2 0.96 0.96 0.96 56
3 0.96 0.93 0.95 54
4 0.97 0.98 0.97 56
5 0.98 0.96 0.97 55
6 0.98 0.98 0.98 56
7 0.98 0.96 0.97 56
8 0.85 0.89 0.87 56
9 0.93 0.91 0.92 55
accuracy 0.94 560
macro avg 0.94 0.94 0.94 560
weighted avg 0.94 0.94 0.94 560
GaussianNB
precision recall f1-score support
0 0.95 0.95 0.95 56
1 0.70 0.77 0.73 57
2 0.92 0.95 0.93 56
3 0.97 0.81 0.88 54
4 0.93 0.91 0.92 56
5 0.95 0.89 0.92 55
6 0.98 0.95 0.97 56
7 0.92 0.93 0.93 56
8 0.75 0.80 0.77 56
9 0.73 0.87 0.79 55
accuracy 0.87 560
macro avg 0.88 0.87 0.87 560
weighted avg 0.88 0.87 0.87 560
KNeighborsClassifier
precision recall f1-score support
0 0.98 0.98 0.98 56
1 0.97 0.95 0.96 57
2 0.98 0.98 0.98 56
3 0.96 0.96 0.96 54
4 0.98 0.98 0.98 56
5 0.98 0.98 0.98 55
6 1.00 0.98 0.99 56
7 0.96 0.98 0.97 56
8 0.93 0.98 0.95 56
9 0.97 0.93 0.95 55
accuracy 0.97 560
macro avg 0.97 0.97 0.97 560
weighted avg 0.97 0.97 0.97 560
Classification report for Breast Cancer:
LogisticRegression
precision recall f1-score support
0 0.98 0.94 0.96 63
1 0.96 0.99 0.97 108
accuracy 0.97 171
macro avg 0.97 0.96 0.97 171
weighted avg 0.97 0.97 0.97 171
GaussianNB
precision recall f1-score support
0 0.97 0.76 0.85 63
1 0.89 0.99 0.94 108
accuracy 0.91 171
macro avg 0.93 0.87 0.89 171
weighted avg 0.92 0.91 0.90 171
KNeighborsClassifier
precision recall f1-score support
0 0.94 0.94 0.94 63
1 0.97 0.97 0.97 108
accuracy 0.96 171
macro avg 0.96 0.95 0.95 171
weighted avg 0.96 0.96 0.96 171
Classification report for Iris:
LogisticRegression
precision recall f1-score support
0 1.00 1.00 1.00 19
1 0.94 0.89 0.91 18
2 0.89 0.94 0.91 16
accuracy 0.94 53
macro avg 0.94 0.94 0.94 53
weighted avg 0.94 0.94 0.94 53
GaussianNB
precision recall f1-score support
0 1.00 1.00 1.00 19
1 0.89 0.89 0.89 18
2 0.86 0.88 0.87 16
accuracy 0.92 53
macro avg 0.92 0.92 0.92 53
weighted avg 0.92 0.92 0.92 53
KNeighborsClassifier
precision recall f1-score support
0 1.00 1.00 1.00 19
1 0.94 0.94 0.94 18
2 0.93 0.94 0.94 16
accuracy 0.96 53
macro avg 0.96 0.96 0.96 53
weighted avg 0.96 0.96 0.96 53
Classification report for Wine:
LogisticRegression
precision recall f1-score support
0 1.00 0.93 0.96 14
1 0.91 1.00 0.95 10
2 1.00 1.00 1.00 11
accuracy 0.96 35
macro avg 0.97 0.98 0.97 35
weighted avg 0.97 0.96 0.96 35
GaussianNB
precision recall f1-score support
0 1.00 0.29 0.44 14
1 0.36 1.00 0.53 10
2 1.00 0.91 0.95 11
accuracy 0.60 35
macro avg 0.79 0.73 0.64 35
weighted avg 0.83 0.60 0.57 35
KNeighborsClassifier
precision recall f1-score support
0 1.00 0.93 0.96 14
1 0.91 1.00 0.95 10
2 1.00 1.00 1.00 11
accuracy 0.97 35
macro avg 0.97 0.98 0.97 35
weighted avg 0.97 0.97 0.97 35
```
从可视化折线图可以看出,在4个数据集中,KNN算法的分类性能表现最好,其次是Logistic Regression,最后是Gaussian Naive Bayes。同时,我们还可以看到每个算法的分类报告,包括precision、recall和f1-score等指标,用于更详细地评估算法的性能。
阅读全文