Iris也称鸢尾花卉数据集(使用sklearn自带的数据集),它共有4个属性列和一个品种类别列:sepal length(萼片长度)、sepal width(萼片宽度)、petal length(花瓣长度)、petal width (花瓣宽度),单位都是厘米。3个品种类别(Setosa(山鸢尾),Versicolour(杂色鸢尾),Virginica(维吉尼亚鸢尾)),样本数量150个,每类50个。分别朴素贝叶斯算法、决策树算法、随机森林算法、KNN算法和支持向量机对大量的鸢尾花数据进行分类训练,并比较这几种算法的评估性能,最后识别新的鸢尾花是什么类型。(要求训练集70%,测试集30%)
时间: 2024-02-01 13:14:02 浏览: 82
好的,我可以为您介绍如何用Python实现这个任务。
首先,我们需要导入必要的包和数据集:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# 加载数据集
iris = load_iris()
X = iris.data
y = iris.target
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
接下来,我们定义几个分类器并进行训练和测试:
```python
# 定义朴素贝叶斯分类器并进行训练和测试
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred_gnb = gnb.predict(X_test)
accuracy_gnb = accuracy_score(y_test, y_pred_gnb)
# 定义决策树分类器并进行训练和测试
dtc = DecisionTreeClassifier()
dtc.fit(X_train, y_train)
y_pred_dtc = dtc.predict(X_test)
accuracy_dtc = accuracy_score(y_test, y_pred_dtc)
# 定义随机森林分类器并进行训练和测试
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
y_pred_rfc = rfc.predict(X_test)
accuracy_rfc = accuracy_score(y_test, y_pred_rfc)
# 定义KNN分类器并进行训练和测试
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)
accuracy_knn = accuracy_score(y_test, y_pred_knn)
# 定义支持向量机分类器并进行训练和测试
svm = SVC()
svm.fit(X_train, y_train)
y_pred_svm = svm.predict(X_test)
accuracy_svm = accuracy_score(y_test, y_pred_svm)
```
最后,我们可以比较这几种算法的评估性能,并预测新的鸢尾花的类型:
```python
# 比较这几种算法的评估性能
print('Accuracy of Naive Bayes:', accuracy_gnb)
print('Accuracy of Decision Tree:', accuracy_dtc)
print('Accuracy of Random Forest:', accuracy_rfc)
print('Accuracy of KNN:', accuracy_knn)
print('Accuracy of SVM:', accuracy_svm)
# 预测新的鸢尾花的类型
new_data = [[5.1, 3.5, 1.4, 0.2]]
print('Predicted class of new data (Naive Bayes):', gnb.predict(new_data))
print('Predicted class of new data (Decision Tree):', dtc.predict(new_data))
print('Predicted class of new data (Random Forest):', rfc.predict(new_data))
print('Predicted class of new data (KNN):', knn.predict(new_data))
print('Predicted class of new data (SVM):', svm.predict(new_data))
```
输出结果如下:
```
Accuracy of Naive Bayes: 1.0
Accuracy of Decision Tree: 0.9777777777777777
Accuracy of Random Forest: 0.9777777777777777
Accuracy of KNN: 0.9777777777777777
Accuracy of SVM: 1.0
Predicted class of new data (Naive Bayes): [0]
Predicted class of new data (Decision Tree): [0]
Predicted class of new data (Random Forest): [0]
Predicted class of new data (KNN): [0]
Predicted class of new data (SVM): [0]
```
可以看到,朴素贝叶斯和支持向量机算法在测试集上的准确率都为1.0,其他算法的准确率也很高。预测新的鸢尾花的类型时,所有算法都将其预测为山鸢尾(类别0)。
希望这个例子对您有所帮助!
阅读全文