利用波士顿地区房价数据集,使用线性回归完成建立预测模型、测试等功能。编写代码
时间: 2024-03-15 18:44:24 浏览: 62
好的,下面是使用Python实现的代码:
```python
# 导入所需的库和数据集
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
# 加载数据集
boston = load_boston()
# 将数据集转换为DataFrame格式
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
# 添加房价作为目标变量
boston_df['PRICE'] = boston.target
# 查看数据集
print(boston_df.head())
# 数据预处理
# 检查是否有缺失值
print(boston_df.isnull().sum())
# 查看数据集的统计信息
print(boston_df.describe())
# 数据可视化
# 绘制每个特征与房价的关系图
sns.set(rc={'figure.figsize':(12, 6)})
for index, feature_name in enumerate(boston.feature_names):
plt.subplot(4, 4, index + 1)
sns.scatterplot(x=feature_name, y='PRICE', data=boston_df)
# 分割数据集为训练集和测试集
from sklearn.model_selection import train_test_split
X = boston_df.drop('PRICE', axis=1)
y = boston_df['PRICE']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 建立模型
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
# 模型评估
from sklearn.metrics import mean_squared_error, r2_score
y_train_pred = lin_reg.predict(X_train)
train_mse = mean_squared_error(y_train, y_train_pred)
train_r2 = r2_score(y_train, y_train_pred)
print('训练集均方误差:', train_mse)
print('训练集R2得分:', train_r2)
y_test_pred = lin_reg.predict(X_test)
test_mse = mean_squared_error(y_test, y_test_pred)
test_r2 = r2_score(y_test, y_test_pred)
print('测试集均方误差:', test_mse)
print('测试集R2得分:', test_r2)
# 预测结果
new_house = pd.DataFrame(np.array([[0.00632, 18.0, 2.31, 0, 0.538, 6.575, 65.2, 4.0900, 1, 296.0, 15.3, 396.90, 4.98]]),
columns=boston.feature_names)
price_pred = lin_reg.predict(new_house)
print('预测房价:', price_pred)
```
希望这份代码能够对你有所帮助。