对iris数据集的属性进行特征选择 (各属性对分类的贡献大小排序)的python代码,并写出结论
时间: 2023-12-10 20:40:34 浏览: 34
以下是对iris数据集属性进行特征选择的Python代码:
```python
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
# 加载数据集
iris = load_iris()
# 创建DataFrame
df = pd.DataFrame(data=np.c_[iris['data'], iris['target']], columns=iris['feature_names'] + ['target'])
# 特征选择
X = df.iloc[:,0:4]
y = df.iloc[:,-1]
selector = SelectKBest(chi2, k=4)
selector.fit(X, y)
# 属性排序
attributes = iris.feature_names
scores = selector.scores_
indices = np.argsort(scores)[::-1]
sorted_attributes = []
for i in indices:
sorted_attributes.append(attributes[i])
# 输出结果
print("Attributes sorted by importance:")
for i in range(len(sorted_attributes)):
print(str(i+1) + ". " + sorted_attributes[i])
```
运行结果:
```
Attributes sorted by importance:
1. petal length (cm)
2. petal width (cm)
3. sepal length (cm)
4. sepal width (cm)
```
结论:根据卡方检验的结果,petal length (花瓣长度) 和 petal width (花瓣宽度) 是对分类的贡献最大的两个属性,其次是 sepal length (花萼长度),最后是 sepal width (花萼宽度)。这与常识和之前研究的结果相符,即花瓣是鸢尾花分类中最重要的特征。