代码实现:不同特征选择方法在Wine数据集的比较
时间: 2024-09-29 09:11:04 浏览: 106
在Python中,我们可以使用sklearn库来比较几种常见的特征选择方法对Wine数据集的效果。Wine数据集是一个经典的分类数据集,包含葡萄酒的各种化学成分测量值以及它们所属的类别。
这里我们假设你已经安装了`pandas`, `numpy`, `scikit-learn`等必要的库。以下是使用几种特征选择方法(如RFE、SelectKBest、VarianceThreshold等)的一个简单示例:
```python
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.feature_selection import SelectKBest, chi2, VarianceThreshold, RFE
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 加载数据
wine = load_wine()
X = wine.data
y = wine.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 1. Univariate Selection (SelectKBest)
kbest = SelectKBest(chi2, k=5) # 可以尝试不同的k值
X_train_kbest = kbest.fit_transform(X_train, y_train)
X_test_kbest = kbest.transform(X_test)
# 2. Variance Thresholding
thresholder = VarianceThreshold(threshold=(.8 * X_train.var()).mean())
X_train_variance = thresholder.fit_transform(X_train)
X_test_variance = thresholder.transform(X_test)
# 3. Recursive Feature Elimination (RFE)
model = LogisticRegression() # 使用模型进行特征重要性排序
rfe = RFE(model, n_features_to_select=5) # 指定要保留的特征数量
X_train_rfe = rfe.fit_transform(X_train, y_train)
X_test_rfe = rfe.transform(X_test)
# 训练模型并评估
models = {
'Select K Best': LogisticRegression(),
'Variance Threshold': LogisticRegression(),
'Recursive Feature Elimination': LogisticRegression()
}
for name, model in models.items():
model.fit(X_train_[name], y_train)
predictions = model.predict(X_test_[name])
print(f"{name} Accuracy: {accuracy_score(y_test, predictions)}")
阅读全文