1、读取cal_housing数据,划分训练集与测试集(测试集30%),对数据进行标准差标准化处理,构建线性回归模型,预测测试集数据;最后评价模型(平均绝对误差、均方误差、R方等指标)。
时间: 2024-12-22 13:33:31 浏览: 13
首先,你需要安装必要的Python库,如pandas用于数据处理,sklearn用于机器学习模型构建和评估。下面是一个步骤-by-step的过程:
1. 导入所需的库:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
```
2. 加载`cal_housing`数据集,例如它通常来自于`sklearn.datasets`:
```python
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()
X = data.data
y = data.target
```
3. 划分训练集和测试集,设置测试集比例为30%:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
4. 对数据进行标准差标准化处理:
```python
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
5. 构建线性回归模型并进行预测:
```python
model = LinearRegression()
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)
```
6. 计算并打印各种评估指标:
```python
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)
```
阅读全文