题目四:编写 SMO 算法实现线性 SVM 分类器,对 iris 数据集进行二分类。具体内容: (1)选取两个特征和两类数据进行二分类。 注意:二分类标签为 1 和-1。 (2)划分数据(分成训练集和数据集) (3)数据归一
时间: 2024-03-03 11:49:23 浏览: 137
化,将特征值缩放到0-1范围内。 (4)实现SMO算法,训练出线性SVM分类器。 (5)在测试集上评估分类器性能,计算出准确率、精确率、召回率和F1-score等指标。
下面是具体的实现步骤:
1.导入相关库和iris数据集,选取 'sepal length (cm)' 和 'petal length (cm)' 两个特征,以及 'setosa' 和 'versicolor' 两类数据进行二分类。
```
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
# 选取 'sepal length (cm)' 和 'petal length (cm)' 两个特征,以及 'setosa' 和 'versicolor' 两类数据进行二分类
df = df[(df['target'] == 0) | (df['target'] == 1)]
df = df[['sepal length (cm)', 'petal length (cm)', 'target']]
df['target'] = np.where(df['target']==0, -1, 1)
X = df.iloc[:, :2].values
y = df.iloc[:, 2].values
```
2.划分数据集,将数据集按照7:3的比例分为训练集和测试集。
```
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
```
3.数据归一化,将特征值缩放到0-1范围内。
```
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
4.实现SMO算法,训练出线性SVM分类器。
```
class SVM:
def __init__(self, C=1.0):
self.C = C
self.alpha = None
self.b = None
self.E = None
def fit(self, X, y, max_iter=100):
n_samples, n_features = X.shape
self.alpha = np.zeros(n_samples)
self.b = 0
self.E = np.zeros(n_samples)
for _ in range(max_iter):
for i in range(n_samples):
E_i = self.E[i] if self.E[i] else self.decision_function(X[i]) - y[i]
if (y[i]*E_i < -0.001 and self.alpha[i] < self.C) or (y[i]*E_i > 0.001 and self.alpha[i] > 0):
j = i
while j == i:
j = np.random.randint(n_samples)
E_j = self.E[j] if self.E[j] else self.decision_function(X[j]) - y[j]
alpha_i_old = self.alpha[i]
alpha_j_old = self.alpha[j]
if y[i] != y[j]:
L = max(0, self.alpha[j] - self.alpha[i])
H = min(self.C, self.C + self.alpha[j] - self.alpha[i])
else:
L = max(0, self.alpha[i] + self.alpha[j] - self.C)
H = min(self.C, self.alpha[i] + self.alpha[j])
eta = 2 * np.dot(X[i], X[j].T) - np.dot(X[i], X[i].T) - np.dot(X[j], X[j].T)
if eta >= 0:
continue
self.alpha[j] -= (y[j]*(E_i - E_j))/eta
self.alpha[j] = min(H, self.alpha[j])
self.alpha[j] = max(L, self.alpha[j])
self.alpha[i] += y[i]*y[j]*(alpha_j_old - self.alpha[j])
b1 = self.b - E_i - y[i]*(self.alpha[i] - alpha_i_old)*np.dot(X[i], X[i].T) - y[j]*(self.alpha[j] - alpha_j_old)*np.dot(X[i], X[j].T)
b2 = self.b - E_j - y[i]*(self.alpha[i] - alpha_i_old)*np.dot(X[i], X[j].T) - y[j]*(self.alpha[j] - alpha_j_old)*np.dot(X[j], X[j].T)
if 0 < self.alpha[i] < self.C:
self.b = b1
elif 0 < self.alpha[j] < self.C:
self.b = b2
else:
self.b = (b1 + b2)/2
self.E[i] = self.decision_function(X[i]) - y[i]
self.E[j] = self.decision_function(X[j]) - y[j]
def decision_function(self, X):
return np.dot(X, self.alpha*y) + self.b
def predict(self, X):
return np.sign(self.decision_function(X))
```
5.在测试集上评估分类器性能,计算出准确率、精确率、召回率和F1-score等指标。
```
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
svm = SVM(C=1)
svm.fit(X_train, y_train)
y_pred = svm.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Precision:', precision_score(y_test, y_pred))
print('Recall:', recall_score(y_test, y_pred))
print('F1-score:', f1_score(y_test, y_pred))
```
阅读全文