python 逐步线性回归
时间: 2024-05-19 13:10:00 浏览: 164
逐步线性回归是一种常用的线性回归方法,它通过逐步选择最优的特征集合来建立模型。在该方法中,模型的参数估计采用OLS(普通最小二乘)法,但是模型特征选择采用逐步回归的思想,它可以避免过拟合问题,并提高模型的预测精度。Python中有很多库可以实现逐步线性回归,其中比较常用的是statsmodels和sklearn。
在statsmodels中,可以使用stepwise_selection函数实现逐步回归,代码如下:
```
import statsmodels.api as sm
from sklearn.datasets import load_boston
import pandas as pd
data = load_boston()
df = pd.DataFrame(data.data, columns=data.feature_names)
target = pd.DataFrame(data.target, columns=["MEDV"])
# Forward stepwise selection
def forward_selected(data, response):
remaining = set(data.columns)
selected = []
current_score, best_new_score = float('inf'), float('inf')
while remaining and current_score == best_new_score:
scores_with_candidates = []
for candidate in remaining:
model = sm.OLS(response, sm.add_constant(pd.DataFrame(data[selected + [candidate]]))).fit()
score = model.rsquared_adj
scores_with_candidates.append((score, candidate))
scores_with_candidates.sort()
best_new_score, best_candidate = scores_with_candidates.pop()
if current_score > best_new_score:
remaining.remove(best_candidate)
selected.append(best_candidate)
current_score = best_new_score
return selected
print(forward_selected(df, target))
```
在sklearn中,可以使用sklearn.linear_model.LinearRegression类和sklearn.feature_selection.RFE类实现逐步回归,代码如下:
```
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
X = data.data
y = data.target
model = LinearRegression()
# Recursive Feature Elimination
rfe = RFE(model, 5)
fit = rfe.fit(X, y)
print("Selected Features: ", fit.support_)
print("Feature Ranking: ", fit.ranking_)
```
阅读全文