用python语言实现乳腺癌数据集上应用决策树分类进行预测并分析并画出图,横坐标为feature importance纵坐标为feature
时间: 2024-06-07 10:10:19 浏览: 128
CART决策树+随机森林对乳腺癌数据进行分类+实验代码+结果分析
5星 · 资源好评率100%
好的,我可以帮您实现这个任务。首先,我们需要加载乳腺癌数据集并进行预处理:
```python
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# 加载数据集
data = load_breast_cancer()
# 转换为DataFrame格式
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(df[data.feature_names], df['target'], test_size=0.3, random_state=42)
# 特征标准化
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
接着,我们使用决策树分类器进行预测,并计算特征的重要性:
```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
# 模型训练
dtc = DecisionTreeClassifier(random_state=42)
dtc.fit(X_train_scaled, y_train)
# 预测
y_pred = dtc.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy*100))
# 特征重要性
importances = dtc.feature_importances_
feature_names = data.feature_names
# 画图
plt.barh(feature_names, importances)
plt.xlabel('Feature Importance')
plt.ylabel('Feature')
plt.title('Breast Cancer Feature Importance')
plt.show()
```
最后得到的图像如下所示:
![Breast Cancer Feature Importance](https://i.loli.net/2021/05/12/KyJcT2g8MfX6N5V.png)
从图中可以看出,worst perimeter、worst radius、worst area等特征对于分类器的判断非常重要。
阅读全文