基于随机森林的特征选择python
时间: 2023-10-16 20:06:33 浏览: 126
使用随机森林进行特征选择的步骤如下:
1. 导入需要的库和数据集:
```python
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
data = pd.read_csv('data.csv')
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
```
2. 创建随机森林模型并拟合数据:
```python
rf = RandomForestClassifier()
rf.fit(X, y)
```
3. 提取特征重要性:
```python
importances = rf.feature_importances_
```
4. 将特征重要性与特征名称一起打包成元组,并按重要性降序排序:
```python
features = list(X.columns)
feature_importances = list(zip(features, importances))
feature_importances.sort(key=lambda x: x[1], reverse=True)
```
5. 选择重要性排名前k个的特征作为最终特征集:
```python
k = 10
selected_features = [f[0] for f in feature_importances[:k]]
```
完整代码如下:
```python
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
data = pd.read_csv('data.csv')
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
rf = RandomForestClassifier()
rf.fit(X, y)
importances = rf.feature_importances_
features = list(X.columns)
feature_importances = list(zip(features, importances))
feature_importances.sort(key=lambda x: x[1], reverse=True)
k = 10
selected_features = [f[0] for f in feature_importances[:k]]
```
以上代码将选择重要性排名前10个的特征作为最终特征集。您可以根据需要更改k的值。
阅读全文