机器学习波士顿房价线性回归预测
时间: 2023-08-26 21:07:07 浏览: 135
波士顿房价数据集是一个经典的数据集,用于线性回归模型的训练和评估。下面是使用Python和Scikit-learn库来实现波士顿房价预测的步骤:
1. 导入所需库和数据集
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.DataFrame(boston.target, columns=['MEDV'])
```
2. 数据预处理
```python
# 查看数据集信息
print(X.info())
# 查看数据集统计信息
print(X.describe())
# 查看缺失值情况
print(X.isnull().sum())
# 将数据集分为训练集和测试集
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 对训练集和测试集进行标准化处理
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
3. 构建线性回归模型
```python
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)
```
4. 模型评估
```python
# 计算训练集和测试集上的R2得分
print('Training R2 score:', lr.score(X_train_scaled, y_train))
print('Testing R2 score:', lr.score(X_test_scaled, y_test))
# 计算训练集和测试集上的均方误差
from sklearn.metrics import mean_squared_error
y_train_pred = lr.predict(X_train_scaled)
y_test_pred = lr.predict(X_test_scaled)
print('Training Mean Squared Error:', mean_squared_error(y_train, y_train_pred))
print('Testing Mean Squared Error:', mean_squared_error(y_test, y_test_pred))
```
5. 使用模型进行预测
```python
# 构造一组新的房屋属性数据
new_house = np.array([6.320e-03, 1.800e+01, 2.310e+00, 0.000e+00, 5.380e-01, 6.575e+00, 6.520e+01, 4.090e+00, 1.000e+00, 2.960e+02, 1.530e+01, 3.969e+02, 4.980e+00]).reshape(1, -1)
new_house_scaled = scaler.transform(new_house)
# 使用模型进行预测
price_pred = lr.predict(new_house_scaled)
print('Predicted price:', price_pred[0])
```
以上就是使用线性回归模型进行波士顿房价预测的完整代码。注意,这只是一个简单的示例,实际应用中可能需要更复杂的特征工程和模型调参。