岭回归预测波士顿房价
时间: 2024-01-09 10:21:30 浏览: 99
以下是使用岭回归预测波士顿房价的步骤:
1. 准备数据集并进行可视化分析
```python
# 引入数据集
from sklearn.datasets import load_boston
import pandas as pd
boston = load_boston()
data = pd.DataFrame(boston.data, columns=boston.feature_names)
data['PRICE'] = boston.target
# 可视化分析
import seaborn as sns
import matplotlib.pyplot as plt
sns.pairplot(data, x_vars=boston.feature_names, y_vars='PRICE', height=5, aspect=0.7, kind='reg')
plt.show()
```
2. 选择影响房价的主要特征
```python
# 根据房价与各特征的相关性分析,选择主要特征
correlation_matrix = data.corr().round(2)
sns.heatmap(data=correlation_matrix, annot=True)
plt.show()
```
3. 划分数据集并训练岭回归模型
```python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score
# 划分数据集
X = data[boston.feature_names]
y = data['PRICE']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 训练岭回归模型
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
```
4. 对岭回归模型进行正则参数调优
```python
# 正则参数调优
from sklearn.model_selection import GridSearchCV
parameters = {'alpha': [1e-15, 1e-10, 1e-8, 1e-4, 1e-3, 1e-2, 1, 5, 10, 20]}
ridge_regressor = GridSearchCV(ridge, parameters, scoring='neg_mean_squared_error', cv=5)
ridge_regressor.fit(X_train, y_train)
print(ridge_regressor.best_params_)
```
5. 测试模型在测试集上的性能
```python
# 测试模型性能
y_pred = ridge_regressor.predict(X_test)
rmse = (mean_squared_error(y_test, y_pred))**0.5
r2 = r2_score(y_test, y_pred)
print("RMSE:", rmse)
print("R2 Score:", r2)
```
阅读全文