使用HistGradientBoostingClassifier和GradientBoostingClassifier对MNIST数据集进行分类的3种示例代码
时间: 2024-02-20 18:56:17 浏览: 24
好的,以下是三种使用HistGradientBoostingClassifier和GradientBoostingClassifier对MNIST数据集进行分类的示例代码:
## 示例代码一:使用默认参数训练模型
```python
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score
# 加载MNIST数据集
mnist = fetch_openml('mnist_784')
X, y = mnist.data, mnist.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 使用HistGradientBoostingClassifier训练模型并计算准确率
hist_clf = HistGradientBoostingClassifier()
hist_clf.fit(X_train, y_train)
hist_pred = hist_clf.predict(X_test)
hist_acc = accuracy_score(y_test, hist_pred)
# 使用GradientBoostingClassifier训练模型并计算准确率
gb_clf = GradientBoostingClassifier()
gb_clf.fit(X_train, y_train)
gb_pred = gb_clf.predict(X_test)
gb_acc = accuracy_score(y_test, gb_pred)
# 输出结果
print("使用HistGradientBoostingClassifier的准确率为:{:.2f}%".format(hist_acc * 100))
print("使用GradientBoostingClassifier的准确率为:{:.2f}%".format(gb_acc * 100))
```
## 示例代码二:使用GridSearchCV进行参数调优
```python
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import HistGradientBoostingClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score
# 加载MNIST数据集
mnist = fetch_openml('mnist_784')
X, y = mnist.data, mnist.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 使用HistGradientBoostingClassifier进行参数调优并计算准确率
hist_clf = HistGradientBoostingClassifier()
hist_param_grid = {
'max_iter': [100, 200, 300],
'learning_rate': [0.05, 0.1, 0.2]
}
hist_grid_search = GridSearchCV(hist_clf, hist_param_grid, cv=3, n_jobs=-1, verbose=2)
hist_grid_search.fit(X_train, y_train)
hist_best_clf = hist_grid_search.best_estimator_
hist_pred = hist_best_clf.predict(X_test)
hist_acc = accuracy_score(y_test, hist_pred)
# 使用GradientBoostingClassifier进行参数调优并计算准确率
gb_clf = GradientBoostingClassifier()
gb_param_grid = {
'n_estimators': [100, 200, 300],
'learning_rate': [0.05, 0.1, 0.2]
}
gb_grid_search = GridSearchCV(gb_clf, gb_param_grid, cv=3, n_jobs=-1, verbose=2)
gb_grid_search.fit(X_train, y_train)
gb_best_clf = gb_grid_search.best_estimator_
gb_pred = gb_best_clf.predict(X_test)
gb_acc = accuracy_score(y_test, gb_pred)
# 输出结果
print("使用HistGradientBoostingClassifier的最佳准确率为:{:.2f}%".format(hist_grid_search.best_score_ * 100))
print("使用GradientBoostingClassifier的最佳准确率为:{:.2f}%".format(gb_grid_search.best_score_ * 100))
```
## 示例代码三:使用自定义损失函数训练模型
```python
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# 加载MNIST数据集
mnist = fetch_openml('mnist_784')
X, y = mnist.data, mnist.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 定义自定义损失函数
def custom_loss(y_true, y_pred):
loss = np.zeros_like(y_true, dtype=np.float64)
for i in range(len(y_true)):
if y_true[i] == y_pred[i]:
loss[i] = 0
else:
loss[i] = 1
return loss
# 使用HistGradientBoostingClassifier训练模型并计算准确率
hist_clf = HistGradientBoostingClassifier(loss=custom_loss)
hist_clf.fit(X_train, y_train)
hist_pred = hist_clf.predict(X_test)
hist_acc = accuracy_score(y_test, hist_pred)
# 使用GradientBoostingClassifier训练模型并计算准确率
gb_clf = GradientBoostingClassifier(loss=custom_loss)
gb_clf.fit(X_train, y_train)
gb_pred = gb_clf.predict(X_test)
gb_acc = accuracy_score(y_test, gb_pred)
# 输出结果
print("使用HistGradientBoostingClassifier的准确率为:{:.2f}%".format(hist_acc * 100))
print("使用GradientBoostingClassifier的准确率为:{:.2f}%".format(gb_acc * 100))
```
这三种示例代码演示了如何使用HistGradientBoostingClassifier和GradientBoostingClassifier对MNIST数据集进行分类,包括使用默认参数训练模型、使用GridSearchCV进行参数调优和使用自定义损失函数训练模型。