用python实现蘑菇分类算法
时间: 2023-09-17 20:07:30 浏览: 82
蘑菇分类算法是一种常用的机器学习算法,可以用来对蘑菇进行分类,判断其是否有毒性。以下是用Python实现蘑菇分类算法的基本步骤:
1. 数据预处理:读取数据集,将数据集分为训练集和测试集,并进行标准化处理。
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# 读取数据集
data = pd.read_csv('mushrooms.csv')
# 分离特征和标签
X = data.drop('class', axis=1)
y = data['class']
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 标准化处理
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```
2. 模型训练:使用逻辑回归算法进行模型训练。
```python
from sklearn.linear_model import LogisticRegression
# 创建逻辑回归模型
classifier = LogisticRegression(random_state=0)
# 模型训练
classifier.fit(X_train, y_train)
```
3. 模型评估:使用测试集对模型进行评估。
```python
from sklearn.metrics import confusion_matrix, accuracy_score
# 模型预测
y_pred = classifier.predict(X_test)
# 混淆矩阵
cm = confusion_matrix(y_test, y_pred)
print(cm)
# 准确率
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy: {:.2f}%'.format(accuracy * 100))
```
完整代码如下:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
# 读取数据集
data = pd.read_csv('mushrooms.csv')
# 分离特征和标签
X = data.drop('class', axis=1)
y = data['class']
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 标准化处理
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# 创建逻辑回归模型
classifier = LogisticRegression(random_state=0)
# 模型训练
classifier.fit(X_train, y_train)
# 模型预测
y_pred = classifier.predict(X_test)
# 混淆矩阵
cm = confusion_matrix(y_test, y_pred)
print(cm)
# 准确率
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy: {:.2f}%'.format(accuracy * 100))
```
注意:这只是一个简单的示例,实际应用中需要根据具体情况进行调整和优化。