如何使用sklearn中波士顿房价预测数据集
时间: 2024-05-14 09:12:45 浏览: 83
1. 导入波士顿房价预测数据集
```python
from sklearn.datasets import load_boston
boston = load_boston()
```
2. 查看数据集的基本信息
```python
print(boston.DESCR) # 查看数据集的描述信息
print(boston.feature_names) # 查看数据集的特征名称
print(boston.data.shape) # 查看数据集的数据维度
```
3. 数据预处理
```python
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)
# 特征标准化
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
4. 构建模型
```python
from sklearn.linear_model import LinearRegression
# 构建线性回归模型
model = LinearRegression()
# 训练模型
model.fit(X_train_scaled, y_train)
# 预测测试集结果
y_pred = model.predict(X_test_scaled)
```
5. 模型评估
```python
from sklearn.metrics import mean_squared_error, r2_score
# 计算均方误差
mse = mean_squared_error(y_test, y_pred)
# 计算R2系数
r2 = r2_score(y_test, y_pred)
print('均方误差:', mse)
print('R2系数:', r2)
```
6. 可视化结果
```python
import matplotlib.pyplot as plt
# 绘制预测值和真实值的散点图
plt.scatter(y_test, y_pred)
plt.xlabel('True Values')
plt.ylabel('Predictions')
plt.show()
```
阅读全文