将鸢尾花数据集按8:2拆分成训练集和测试集,建立支持向量机预测模型,并比较线性核函数与径向基核函数的性能。
时间: 2024-12-20 12:31:22 浏览: 9
首先,我们需要导入所需的库并加载鸢尾花数据集。然后按照80%训练集、20%测试集的比例划分数据。
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
# 加载鸢尾花数据
iris = datasets.load_iris()
X = iris.data
y = iris.target
# 数据预处理 - 标准化
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# 拆分数据集
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
```
接下来,我们分别创建SVM模型使用线性内核和径向基内核:
```python
# 线性核函数(SVC(kernel='linear'))
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
# 径向基核函数(SVC(kernel='rbf'))
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
```
对于每个模型,我们可以进行预测并在测试集上计算性能:
```python
# 线性核预测和评估
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)
conf_mat_linear = confusion_matrix(y_test, y_pred_linear)
# 径向基核预测和评估
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)
conf_mat_rbf = confusion_matrix(y_test, y_pred_rbf)
```
最后,我们可以比较两个模型的性能:
```python
print(f"Linear SVM Accuracy: {accuracy_linear}, Confusion Matrix: \n{conf_mat_linear}")
print(f"RBF SVM Accuracy: {accuracy_rbf}, Confusion Matrix: \n{conf_mat_rbf}")
# 可视化对比结果
plt.figure(figsize=(9, 4))
plt.subplot(1, 2, 1)
plt.title('Linear Kernel Performance')
plt.bar(['Accuracy', 'Confusion Matrix'], [accuracy_linear, None])
plt.subplot(1, 2, 2)
plt.title('Radial Basis Function (RBF) Kernel Performance')
plt.bar(['Accuracy', 'Confusion Matrix'], [accuracy_rbf, None])
plt.show()
```
阅读全文