请你给我一段任意选择两个 UCI 数据集,使用 LIBSVM,分别用线性核和高斯核训练一个 SVM,比较其支持向量的差别的代码
时间: 2024-03-05 14:52:40 浏览: 72
测试SVM支持向量机代码的两个小数据集
好的,我选择两个UCI数据集:Iris和Breast Cancer Wisconsin。以下是使用LIBSVM训练SVM并比较支持向量差异的代码:
```python
import numpy as np
from sklearn.datasets import load_iris, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from svmutil import svm_train, svm_predict
# 加载数据集
iris = load_iris()
breast_cancer = load_breast_cancer()
# 划分训练集和测试集
iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris.data, iris.target, test_size=0.2)
breast_cancer_X_train, breast_cancer_X_test, breast_cancer_y_train, breast_cancer_y_test = train_test_split(breast_cancer.data, breast_cancer.target, test_size=0.2)
# 数据标准化
scaler = StandardScaler()
iris_X_train = scaler.fit_transform(iris_X_train)
iris_X_test = scaler.transform(iris_X_test)
breast_cancer_X_train = scaler.fit_transform(breast_cancer_X_train)
breast_cancer_X_test = scaler.transform(breast_cancer_X_test)
# 线性核
print('Linear kernel:')
# Iris
model = svm_train(iris_y_train, iris_X_train, '-t 0')
p_label, p_acc, p_val = svm_predict(iris_y_test, iris_X_test, model)
iris_linear_sv_indices = model.get_sv_indices()
# Breast Cancer Wisconsin
model = svm_train(breast_cancer_y_train, breast_cancer_X_train, '-t 0')
p_label, p_acc, p_val = svm_predict(breast_cancer_y_test, breast_cancer_X_test, model)
breast_cancer_linear_sv_indices = model.get_sv_indices()
# 高斯核
print('Gaussian kernel:')
# Iris
model = svm_train(iris_y_train, iris_X_train, '-t 2')
p_label, p_acc, p_val = svm_predict(iris_y_test, iris_X_test, model)
iris_gaussian_sv_indices = model.get_sv_indices()
# Breast Cancer Wisconsin
model = svm_train(breast_cancer_y_train, breast_cancer_X_train, '-t 2')
p_label, p_acc, p_val = svm_predict(breast_cancer_y_test, breast_cancer_X_test, model)
breast_cancer_gaussian_sv_indices = model.get_sv_indices()
# 比较支持向量差异
iris_linear_sv_set = set(iris_linear_sv_indices)
iris_gaussian_sv_set = set(iris_gaussian_sv_indices)
breast_cancer_linear_sv_set = set(breast_cancer_linear_sv_indices)
breast_cancer_gaussian_sv_set = set(breast_cancer_gaussian_sv_indices)
print('Iris dataset:')
print('Number of support vectors using linear kernel:', len(iris_linear_sv_set))
print('Number of support vectors using Gaussian kernel:', len(iris_gaussian_sv_set))
print('Support vectors common to both:', len(iris_linear_sv_set.intersection(iris_gaussian_sv_set)))
print('Breast Cancer Wisconsin dataset:')
print('Number of support vectors using linear kernel:', len(breast_cancer_linear_sv_set))
print('Number of support vectors using Gaussian kernel:', len(breast_cancer_gaussian_sv_set))
print('Support vectors common to both:', len(breast_cancer_linear_sv_set.intersection(breast_cancer_gaussian_sv_set)))
```
上述代码中我们选择了两个UCI数据集:Iris和Breast Cancer Wisconsin,分别使用线性核和高斯核训练SVM,并比较其支持向量的差异。我们使用`svm_train`函数和`svm_predict`函数进行训练和预测,并使用`get_sv_indices`函数获取支持向量的索引。最后,我们将支持向量的索引转换成集合,计算两个集合的交集,并输出结果。
阅读全文