用svm对表格数据中4个不同类型的数据进行分类代码
时间: 2024-10-01 20:10:56 浏览: 28
SVM.zip_SVM数据分类_酒SVM
5星 · 资源好评率100%
在Python中,我们可以使用Scikit-learn库中的支持向量机(SVM)进行表格数据的分类,这里假设我们有四个类别(例如数值型、类别型、二元型和文本型)。以下是简单的分类代码示例:
```python
# 导入必要的库
from sklearn import svm
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler (数值型处理)
from sklearn.feature_extraction.text import CountVectorizer (文本型处理)
# 假设df是你的DataFrame,包含特征和标签列
data = pd.read_csv('your_data.csv') # 替换为你的数据文件路径
# 提取特征和目标变量
X = data.iloc[:, :-1] # 假设最后一列是标签
y = data.iloc[:, -1]
# 判断并处理不同类型的特征
numeric_features = X.select_dtypes(include=['int64', 'float64']).columns # 数值型
categorical_features = X.select_dtypes(include='object').columns # 类别型和二元型
text_features = ['feature_name'] if any('text' in col for col in X.columns) else []
# 对数值型特征进行标准化
scaler = StandardScaler()
X[numeric_features] = scaler.fit_transform(X[numeric_features])
# 对类别型和二元型特征编码
X[categorical_features] = X[categorical_features].astype('category')
X[categorical_features] = X[categorical_features].cat.codes
# 对文本特征进行词袋模型转换
vectorizer = CountVectorizer() if text_features else None
X[text_features] = vectorizer.fit_transform(X[text_features])
# 分割数据集为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 初始化SVM模型,选择适当的核函数(如线性、多项式或RBF)
model = svm.SVC(kernel='linear' or 'rbf')
# 训练模型
model.fit(X_train, y_train)
# 预测
predictions = model.predict(X_test)
# 打印一些性能指标,比如准确率
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)
#
阅读全文