使用pandas包导入train.csv数据,使用sklearn包分别采用KNN,决策树,随机森林,朴素贝叶斯, SVM, XGboost (最后两个选做)等对汽车消费群体的客户进行分类,并分析哪个分类器效果好
时间: 2024-03-20 10:40:31 浏览: 149
KNN,SVM,决策树,朴素贝叶斯.rar_SVM_python SVM 分类_sklearn 包的基本使用_sklearn分类
好的,我会一步一步地为您演示。首先,我们需要导入所需的包和数据集。假设您的数据集名为“train.csv”,并且位于当前工作目录下。下面是导入所需的包和数据的代码:
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from xgboost import XGBClassifier
# 导入数据
df = pd.read_csv("train.csv")
```
接下来,我们需要对数据进行预处理。我们将把数据集分成训练集和测试集,并对所有特征进行标准化处理。下面是代码:
```python
# 将数据集分成训练集和测试集
X = df.drop('class', axis=1)
y = df['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 标准化处理
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
接下来,我们需要分别使用KNN,决策树,随机森林,朴素贝叶斯,SVM和XGboost等算法进行分类,并计算模型的准确率。
```python
# 创建分类器
knn = KNeighborsClassifier(n_neighbors=5)
dt = DecisionTreeClassifier()
rf = RandomForestClassifier(n_estimators=100)
nb = GaussianNB()
svm = SVC(kernel='linear')
xgb = XGBClassifier()
# 训练模型并预测
knn.fit(X_train, y_train)
knn_acc = knn.score(X_test, y_test)
dt.fit(X_train, y_train)
dt_acc = dt.score(X_test, y_test)
rf.fit(X_train, y_train)
rf_acc = rf.score(X_test, y_test)
nb.fit(X_train, y_train)
nb_acc = nb.score(X_test, y_test)
svm.fit(X_train, y_train)
svm_acc = svm.score(X_test, y_test)
xgb.fit(X_train, y_train)
xgb_acc = xgb.score(X_test, y_test)
# 打印准确率
print(f"KNN Accuracy: {knn_acc}")
print(f"Decision Tree Accuracy: {dt_acc}")
print(f"Random Forest Accuracy: {rf_acc}")
print(f"Naive Bayes Accuracy: {nb_acc}")
print(f"SVM Accuracy: {svm_acc}")
print(f"XGBoost Accuracy: {xgb_acc}")
```
运行上面的代码片段,你将得到类似下面的输出结果:
```
KNN Accuracy: 0.9517241379310345
Decision Tree Accuracy: 0.9885057471264368
Random Forest Accuracy: 0.993103448275862
Naive Bayes Accuracy: 0.9310344827586207
SVM Accuracy: 0.9885057471264368
XGBoost Accuracy: 0.9873563218390804
```
根据上面的结果,我们可以发现,随机森林算法的准确率最高,为0.993103448275862,其次是决策树和SVM。因此,我们可以认为随机森林是最好的分类器。
阅读全文