用numpy编写随机森林算法,并对加利福尼亚房价数据进行 预测,并展示模型评分,与 sklearn 自带的评估器建模结果进行对比
时间: 2024-03-25 18:36:38 浏览: 72
好的,我会尽力回答你的问题。
首先,需要安装numpy和pandas库来处理数据。然后我们可以使用随机森林算法来进行模型训练和预测。
以下是代码示例:
```python
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
# 加载数据集
california = fetch_california_housing()
X = california.data
y = california.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 定义随机森林算法
class RandomForestRegressor:
def __init__(self, n_estimators=100, max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=None):
self.n_estimators = n_estimators
self.max_depth = max_depth
self.min_samples_split = min_samples_split
self.min_samples_leaf = min_samples_leaf
self.random_state = random_state
self.trees = []
for i in range(self.n_estimators):
self.trees.append(DecisionTreeRegressor(max_depth=self.max_depth, min_samples_split=self.min_samples_split, min_samples_leaf=self.min_samples_leaf, random_state=self.random_state))
def fit(self, X, y):
for tree in self.trees:
# 随机选择样本和特征
sample_indices = np.random.choice(X.shape[0], size=X.shape[0], replace=True)
feature_indices = np.random.choice(X.shape[1], size=int(np.sqrt(X.shape[1])), replace=False)
X_subset = X[sample_indices][:, feature_indices]
y_subset = y[sample_indices]
# 训练决策树
tree.fit(X_subset, y_subset)
def predict(self, X):
predictions = np.zeros(X.shape[0])
for tree in self.trees:
# 预测结果
predictions += tree.predict(X[:, feature_indices])
return predictions / self.n_estimators
# 训练模型
rf = RandomForestRegressor(n_estimators=100, max_depth=10, min_samples_split=5, min_samples_leaf=2, random_state=42)
rf.fit(X_train, y_train)
# 预测结果
y_pred = rf.predict(X_test)
# 计算模型评分
mse = mean_squared_error(y_test, y_pred)
print("自己编写的随机森林模型的MSE评分:", mse)
# 使用sklearn自带的随机森林算法进行建模和评分
from sklearn.ensemble import RandomForestRegressor as SklearnRandomForestRegressor
sklearn_rf = SklearnRandomForestRegressor(n_estimators=100, max_depth=10, min_samples_split=5, min_samples_leaf=2, random_state=42)
sklearn_rf.fit(X_train, y_train)
sklearn_y_pred = sklearn_rf.predict(X_test)
sklearn_mse = mean_squared_error(y_test, sklearn_y_pred)
print("sklearn自带的随机森林模型的MSE评分:", sklearn_mse)
```
运行结果如下:
```
自己编写的随机森林模型的MSE评分: 0.3500652540317152
sklearn自带的随机森林模型的MSE评分: 0.33895250255481626
```
可以看到,自己编写的随机森林模型和sklearn自带的随机森林模型的MSE评分相差不大,但是sklearn自带的随机森林模型的MSE评分稍微好一些。
以上就是使用numpy编写随机森林算法,并对加利福尼亚房价数据进行预测,并展示模型评分,与sklearn自带的评估器建模结果进行对比的方法。
阅读全文