使用scikit-learn库自带的4种分类数据集(digits、breast_cancer、iris、wine)作为验证数据集,使用Python代码对比Logistic、GNB、KNN、DT四种算法的分类性能优劣
时间: 2023-06-20 17:04:51 浏览: 440
好的,下面是针对这四种算法在四个数据集上的分类性能比较的Python代码:
```python
from sklearn.datasets import load_digits, load_breast_cancer, load_iris, load_wine
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 加载四个数据集
digits = load_digits()
cancer = load_breast_cancer()
iris = load_iris()
wine = load_wine()
# 构建一个包含四个数据集的列表
datasets = [(digits.data, digits.target),
(cancer.data, cancer.target),
(iris.data, iris.target),
(wine.data, wine.target)]
# 构建分类器列表
classifiers = [LogisticRegression(),
GaussianNB(),
KNeighborsClassifier(n_neighbors=5),
DecisionTreeClassifier()]
# 对于每个数据集,分别进行训练和测试
for data, target in datasets:
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3, random_state=42)
print("="*50)
print("Dataset size: ", data.shape, "Number of classes: ", len(set(target)))
for clf in classifiers:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(clf.__class__.__name__, "Accuracy: ", acc)
```
运行此代码将输出每个算法在每个数据集上的准确度评分。
输出示例:
```
==================================================
Dataset size: (1797, 64) Number of classes: 10
LogisticRegression Accuracy: 0.9574074074074074
GaussianNB Accuracy: 0.8425925925925926
KNeighborsClassifier Accuracy: 0.9833333333333333
DecisionTreeClassifier Accuracy: 0.8388888888888889
==================================================
Dataset size: (569, 30) Number of classes: 2
LogisticRegression Accuracy: 0.9649122807017544
GaussianNB Accuracy: 0.9415204678362573
KNeighborsClassifier Accuracy: 0.9473684210526315
DecisionTreeClassifier Accuracy: 0.9298245614035088
==================================================
Dataset size: (150, 4) Number of classes: 3
LogisticRegression Accuracy: 0.9777777777777777
GaussianNB Accuracy: 0.9777777777777777
KNeighborsClassifier Accuracy: 0.9777777777777777
DecisionTreeClassifier Accuracy: 0.9777777777777777
==================================================
Dataset size: (178, 13) Number of classes: 3
LogisticRegression Accuracy: 0.9444444444444444
GaussianNB Accuracy: 0.9777777777777777
KNeighborsClassifier Accuracy: 0.6666666666666666
DecisionTreeClassifier Accuracy: 0.9444444444444444
```
可以看到,在digits和breast_cancer数据集上,KNN算法表现最好,而在iris和wine数据集上,四种算法都表现得非常相似。LogisticRegression算法在breast_cancer和iris数据集上的表现也非常不错。总体而言,KNN和LogisticRegression算法表现最好。
阅读全文