Consider a linear model Y = α + β TX + ε. (1) Set X ∼ MV N(0, Σ), Σ = (ρ |i−j| )p×p (the AR(1) structure), where ρ = 0.5, α = 1, β = (2, 1.5, 0, 0, 1, 0, . . . , 0)T , ε ∼ N(0, 1), simulate Y = α + β TX + ε, where the predictor dimension p = 20 and the sample size n = 200. Here, by the model settings, X1, X2 and X5 are the important variables. (2) Estimate regression coefficients using LASSO using the coordinate decent algorithm and soft thresholding . by use 5-folds CV to choose optimal λ by minimizing the CV prediction error (PE), and plot the PE with different λ. python 代码
时间: 2023-11-25 22:08:49 浏览: 108
以下是使用Python进行LASSO回归及交叉验证的代码,使用的是自己编写的基于坐标下降的LASSO回归模型:
```python
import numpy as np
import matplotlib.pyplot as plt
# 1.生成数据
np.random.seed(123)
p = 20
n = 200
rho = 0.5
alpha = 1
beta = np.array([2, 1.5, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Sigma = np.zeros((p, p))
for i in range(p):
for j in range(p):
Sigma[i, j] = rho ** np.abs(i - j)
X = np.random.multivariate_normal(np.zeros(p), Sigma, n)
epsilon = np.random.normal(0, 1, n)
Y = alpha + np.dot(X, beta) + epsilon
# 2.定义LASSO回归模型
def soft_threshold(rho, lam):
if rho > lam:
return rho - lam
elif rho < -lam:
return rho + lam
else:
return 0
def coordinate_descent_lasso(X, Y, lam, max_iter=1000, tol=1e-4):
n_samples, n_features = X.shape
beta = np.zeros(n_features)
r = np.dot(X.T, Y - np.dot(X, beta))
for iteration in range(max_iter):
beta_old = np.copy(beta)
for j in range(n_features):
X_j = X[:, j]
r += X_j * beta_old[j]
beta[j] = soft_threshold(rho=np.dot(X_j, Y - r) / n_samples, lam=lam)
r -= X_j * beta[j]
if np.sum(np.abs(beta - beta_old)) < tol:
break
return beta
def lasso_cv(X, Y, lambdas, n_folds=5):
n_samples, n_features = X.shape
kf = KFold(n_splits=n_folds)
cv_errors = []
for lam in lambdas:
errors = []
for train_idxs, test_idxs in kf.split(X):
X_train, Y_train = X[train_idxs], Y[train_idxs]
X_test, Y_test = X[test_idxs], Y[test_idxs]
beta = coordinate_descent_lasso(X_train, Y_train, lam)
Y_pred = np.dot(X_test, beta)
mse = mean_squared_error(Y_test, Y_pred)
errors.append(mse)
cv_errors.append(np.mean(errors))
return cv_errors
# 3.使用LASSO进行回归及交叉验证
lambdas = np.logspace(-5, 2, 100)
cv_errors = lasso_cv(X, Y, lambdas)
min_mse = np.min(cv_errors)
optimal_lambda = lambdas[np.argmin(cv_errors)]
print('Optimal Lambda:', optimal_lambda)
# 4.绘制交叉验证误差随lambda的变化曲线
plt.plot(np.log10(lambdas), cv_errors)
plt.axvline(np.log10(optimal_lambda), linestyle='--', color='r')
plt.xlabel('log10(lambda)')
plt.ylabel('Mean Squared Error')
plt.title('LASSO Cross Validation')
plt.show()
# 5.输出回归系数
beta_hat = coordinate_descent_lasso(X, Y, optimal_lambda)
print('Regression Coefficients:', beta_hat)
```
这里使用了自己编写的基于坐标下降的LASSO回归模型,并使用交叉验证的方法来选择最优的正则化参数lambda,通过绘制交叉验证误差随lambda的变化曲线来确定最优的lambda值,并输出对应的回归系数。
阅读全文
相关推荐
![-](https://img-home.csdnimg.cn/images/20241231044955.png)
![-](https://img-home.csdnimg.cn/images/20241231044930.png)
![-](https://img-home.csdnimg.cn/images/20241231044930.png)
![application/x-rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![ppt](https://img-home.csdnimg.cn/images/20241231044937.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![ppt](https://img-home.csdnimg.cn/images/20241231044937.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)