根据Adult数据集构建SVM分类器,并测试分类器的性能,输出性能指标并画出ROC曲线 python代码有注释
时间: 2024-02-22 08:55:03 浏览: 92
以下是根据Adult数据集构建SVM分类器,并测试分类器性能的Python代码,其中包含注释以帮助理解。在代码最后,我们输出了准确率、精确率、召回率、F1值和ROC曲线。
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_curve, auc
import matplotlib.pyplot as plt
# 读取数据
data = pd.read_csv('adult.csv')
# 处理缺失值
data = data.replace('?', np.nan)
data = data.dropna()
# 将分类变量转换为数值变量
data['income'] = data['income'].map({'<=50K': 0, '>50K': 1}).astype(int)
# 将所有分类变量转换为数值变量
data = pd.get_dummies(data)
# 分离特征和标签
X = data.drop('income', axis=1)
y = data['income']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 数据标准化
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# 构建SVM模型
model = SVC(kernel='linear', C=1, random_state=0)
# 训练模型
model.fit(X_train, y_train)
# 预测测试集
y_pred = model.predict(X_test)
# 输出准确率、精确率、召回率、F1值
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Precision:', precision_score(y_test, y_pred))
print('Recall:', recall_score(y_test, y_pred))
print('F1 Score:', f1_score(y_test, y_pred))
# 绘制ROC曲线
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc="lower right")
plt.show()
```
输出的结果包含准确率、精确率、召回率、F1值等性能指标。此外,还会绘制ROC曲线,以衡量分类器性能。
阅读全文