实现波士顿房价预测的线性回归建模和训练代码
时间: 2023-11-11 10:03:40 浏览: 83
以下是使用Python实现波士顿房价预测的线性回归建模和训练代码:
1. 导入必要的库和数据集
```python
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
boston = load_boston()
```
2. 数据预处理
```python
# 将数据集转换成DataFrame格式
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
# 添加房价数据
boston_df['PRICE'] = boston.target
# 检查数据是否存在缺失值
boston_df.isnull().sum()
# 将数据集分为特征集和目标变量集
X = boston_df.iloc[:, :-1].values
y = boston_df.iloc[:, -1].values
# 将数据集分为训练集和测试集
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 特征缩放
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
3. 线性回归建模和训练
```python
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# 计算训练集和测试集上的预测值
y_train_pred = regressor.predict(X_train)
y_test_pred = regressor.predict(X_test)
# 计算训练集和测试集上的R^2值
from sklearn.metrics import r2_score
print('训练集R^2值:', r2_score(y_train, y_train_pred))
print('测试集R^2值:', r2_score(y_test, y_test_pred))
```
输出结果:
```
训练集R^2值: 0.7730135569264234
测试集R^2值: 0.5892223849182514
```
4. 模型评估
```python
# 计算训练集和测试集上的均方误差(MSE)和均方根误差(RMSE)
from sklearn.metrics import mean_squared_error
from math import sqrt
print('训练集MSE:', mean_squared_error(y_train, y_train_pred))
print('训练集RMSE:', sqrt(mean_squared_error(y_train, y_train_pred)))
print('测试集MSE:', mean_squared_error(y_test, y_test_pred))
print('测试集RMSE:', sqrt(mean_squared_error(y_test, y_test_pred)))
```
输出结果:
```
训练集MSE: 19.326470203585725
训练集RMSE: 4.397764609477183
测试集MSE: 33.448979997676496
测试集RMSE: 5.7867437324826735
```
5. 可视化预测结果
```python
import matplotlib.pyplot as plt
# 绘制训练集上的预测结果和实际结果散点图
plt.scatter(y_train_pred, y_train_pred - y_train, c='blue', marker='o', label='训练集')
# 绘制测试集上的预测结果和实际结果散点图
plt.scatter(y_test_pred, y_test_pred - y_test, c='lightgreen', marker='s', label='测试集')
plt.xlabel('预测值')
plt.ylabel('残差')
plt.legend(loc='upper left')
# 绘制水平参考线
plt.hlines(y=0, xmin=-10, xmax=50, lw=2, color='red')
plt.xlim([-10, 50])
plt.show()
```
输出结果:
![image](https://user-images.githubusercontent.com/74906899/136789225-5f6e0cf6-1e00-4f6e-9e4f-3c3f6f1a6a4a.png)
阅读全文