训练集和测试集是已经划分好了的,随机森林分类模型特征重要性程度排名前十的ice图怎么画python代码
时间: 2024-11-15 21:30:08 浏览: 9
在Python中,可以使用`sklearn`库中的`RandomForestClassifier`进行随机森林分类模型训练,并利用`eli5`库以及`matplotlib`库来绘制特征的重要性(例如通过`feature_importances_`属性)的Ice plot(也称为Permutation Importance)。以下是简单的步骤:
```python
# 首先,确保安装了所需的库
!pip install eli5 scikit-learn matplotlib
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris # 使用示例数据集
from sklearn.model_selection import train_test_split
from eli5.sklearn import PermutationImportance
import matplotlib.pyplot as plt
# 加载数据
iris = load_iris()
X = iris.data
y = iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 创建随机森林分类器
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
# 计算特征重要性
perm_importance = PermutationImportance(rf_model, scoring='accuracy', n_iter=5).fit(X_test, y_test)
# 获取排序后的特征重要性
sorted_idx = perm_importance.feature_importances_..argsort()
# 构建Ice Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(sorted_idx, perm_importance.importances_mean[sorted_idx], 'o-', label="Mean importance")
ax.fill_between(
sorted_idx,
perm_importance.importances_mean[sorted_idx] - perm_importance.importances_std[sorted_idx],
perm_importance.importances_mean[sorted_idx] + perm_importance.importances_std[sorted_idx],
alpha=0.2,
color="skyblue",
edgecolor="none"
)
ax.set_ylabel("Feature Importance Score")
ax.set_xlabel("Feature Index")
ax.set_title("Top 10 Feature Importance in Random Forest Model (Iris Dataset)")
ax.set_xticks(sorted_idx)
ax.set_xticklabels(iris.feature_names[sorted_idx][:10]) # 只显示前10个特征名称
ax.legend(loc="best")
plt.show()
阅读全文