xgbooost 特征选择代码
时间: 2023-07-03 10:33:44 浏览: 44
以下是基于特征重要性排序的特征选择代码示例:
```python
import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
# 加载数据
data = pd.read_csv('data.csv')
X = data.drop(['target'], axis=1)
y = data['target']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练模型
xgb_model = xgb.XGBClassifier()
xgb_model.fit(X_train, y_train)
# 特征重要性排序
importance_score = xgb_model.feature_importances_
feature_names = X_train.columns.values
# 排序并选择前n个重要的特征
n = 10
sorted_idx = np.argsort(importance_score)[::-1][:n]
selected_features = feature_names[sorted_idx]
# 输出选择的特征
print('Selected features: ', selected_features)
```
以下是基于特征选择算法的特征选择代码示例:
```python
import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
# 加载数据
data = pd.read_csv('data.csv')
X = data.drop(['target'], axis=1)
y = data['target']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练模型
xgb_model = xgb.XGBClassifier()
xgb_model.fit(X_train, y_train)
# 特征选择
selector = SelectFromModel(estimator=xgb_model, threshold='mean', max_features=10)
selector.fit(X_train, y_train)
selected_features = X_train.columns[selector.get_support()]
# 输出选择的特征
print('Selected features: ', selected_features)
```
需要注意的是,在使用特征选择算法时,需要设置阈值和最大特征数来控制选择的特征数量。同时,也可以使用其他的特征选择算法,如基于树的特征选择算法等。