用sklearn实现决策树预测肿瘤,根据adminssion Predict.csv写,用def命令#读数据预处理#去重#处理异常值#强制转换为整型数据#替换标签值#评价指标#可视化#预测值#真实值#决策树算法#评价#正确率#评价指标#c4.5决策树#cart决策树
时间: 2024-06-13 20:08:00 浏览: 67
以下是使用sklearn实现决策树预测肿瘤的步骤:
1. 导入需要的库和数据集
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
data = pd.read_csv('adminssion Predict.csv')
```
2. 数据预处理
```python
# 去重
data.drop_duplicates(inplace=True)
# 处理异常值
data = data[(data['GRE Score'] >= 290) & (data['GRE Score'] <= 340)]
data = data[(data['TOEFL Score'] >= 90) & (data['TOEFL Score'] <= 120)]
data = data[(data['University Rating'] >= 1) & (data['University Rating'] <= 5)]
data = data[(data['SOP'] >= 1) & (data['SOP'] <= 5)]
data = data[(data['LOR '] >= 1) & (data['LOR '] <= 5)]
data = data[(data['CGPA'] >= 6) & (data['CGPA'] <= 10)]
data = data[(data['Research'] == 0) | (data['Research'] == 1)]
# 强制转换为整型数据
data['GRE Score'] = data['GRE Score'].astype(int)
data['TOEFL Score'] = data['TOEFL Score'].astype(int)
data['University Rating'] = data['University Rating'].astype(int)
data['SOP'] = data['SOP'].astype(int)
data['LOR '] = data['LOR '].astype(int)
data['CGPA'] = data['CGPA'].astype(int)
# 替换标签值
data['Chance of Admit '] = np.where(data['Chance of Admit '] >= 0.5,1, 0)
```
3. 划分数据集
```python
X = data.drop(['Chance of Admit '], axis=1)
y = data['Chance of Admit ']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
4. 训练模型
```python
tree = DecisionTreeClassifier(max_depth=4, random_state=0)
tree.fit(X_train, y_train)
```
5. 评价指标
```python
# 预测值
y_pred = tree.predict(X_test)
# 真实值
y_true = y_test.values
# 混淆矩阵
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:\n", cm)
# 正确率
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy: {:.2f}%".format(accuracy*100))
```
6. 可视化
```python
from sklearn.tree import plot_tree
plt.figure(figsize=(20,10))
plot_tree(tree, filled=True)
plt.show()
```
7. 相关问题
阅读全文