使用心脏病数据集,找到最佳的max_feature参数,以及该参数下模型在验证集的准确率的python代码
时间: 2024-10-25 15:15:04 浏览: 51
python卷积神经网络CNN图像分类验证大模型性能PyTorch框架torchvision在MNIST大数据集测试代码
为了找到心脏疾病数据集中最佳的`max_features`参数,并计算该参数下模型在验证集上的准确率,我们通常会使用Python的scikit-learn库中的`GridSearchCV`来完成超参数优化。这里是一个简单的示例,假设你正在使用的是决策树分类器(如`DecisionTreeClassifier`),并有一个名为`heart_data.csv`的数据集:
```python
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris # 或者导入你的数据集
# 加载数据
data = load_iris() # 如果是csv文件,用pandas读取
X = data.data
y = data.target
# 划分训练集和验证集
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# 定义决策树分类器和参数网格
param_grid = {
'max_features': [None, 'auto', 'sqrt', 'log2'],
}
# 创建GridSearchCV对象
grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5, scoring='accuracy')
# 训练模型并寻找最佳参数
grid_search.fit(X_train, y_train)
# 获取最佳参数及其对应的最佳准确率
best_params = grid_search.best_params_
best_accuracy = grid_search.best_score_
# 预测验证集并计算准确率
val_predictions = grid_search.predict(X_val)
val_accuracy = accuracy_score(y_val, val_predictions)
print(f"Best max_features value: {best_params['max_features']}")
print(f"Validation set accuracy with best params: {val_accuracy}")
阅读全文