波士顿房价1.使用第一个特征实现线性回归模型,输出loss曲线,并使用至少三个指标评价模型性能,输出性能表格。2.分别使用2、4、6、10、14个特征复现线性回归模型,列出性能表格。3.选择3个比较重要的特征,复现线性回归模型,列出性能表格。详细代码展示
时间: 2024-10-25 07:09:02 浏览: 69
波士顿房价数据集是一个经典的机器学习示例,通常用于说明如何使用线性回归。这里我们将使用Python的`sklearn`库来完成这个任务。首先,确保已安装所需的库:
```bash
pip install numpy pandas scikit-learn matplotlib seaborn
```
然后,我们开始编写代码:
1. **使用单个特征实现线性回归**
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
# 加载数据
boston = load_boston()
X = boston.data
y = boston.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X[:, [0]], # 使用第一个特征
y,
test_size=0.2,
random_state=42)
# 创建并拟合模型
model = LinearRegression()
model.fit(X_train, y_train)
# 预测并计算损失
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
# 输出性能指标
print(f"Loss Curve (MSE): {mse}")
print("\nPerformance Metrics:")
print(pd.DataFrame({"Metric": ["RMSE", "R^2", "MAE"],
"Value": [np.sqrt(mse), r2, mae]}))
```
2. **使用多个特征复现线性回归**
```python
features_to_use = [2, 4, 6, 10, 14] # 特征编号
performance_tables = []
for feature in features_to_use:
X_train, X_test, y_train, y_test = train_test_split(X[:, :feature], y, ... # 使用对应特征
...)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
performance_table = {"Feature": feature,
"MSE": mse,
"R^2": r2,
"MAE": mae}
performance_tables.append(performance_table)
# 将所有结果合并到一个DataFrame中
results_df = pd.DataFrame(performance_tables).set_index("Feature")
print(results_df)
```
3. **选择重要特征复现线性回归**
```python
# 计算特征重要性(假设Boston Housing数据集没有直接提供,需要额外处理)
# 例如使用Lasso Regression for feature selection
lasso = Lasso(alpha=0.1) # 或其他适合的选择
lasso.fit(X, y)
important_features = lasso.coef_ > 0 # 看哪些特征系数非零
important_features_indices = np.where(important_features)[0]
X_train_important, _, y_train_, _ = train_test_split(X[:, important_features_indices],
y,
...)
model = LinearRegression()
model.fit(X_train_important, y_train_)
y_pred = model.predict(X_test[:, important_features_indices])
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
performance_table = {"Important Features": list(important_features_indices),
"MSE": mse,
"R^2": r2,
"MAE": mae}
print(performance_table)
```
阅读全文
相关推荐


















