使用LASSON 数据处理 python案例
时间: 2023-10-24 15:17:02 浏览: 110
以下是使用LASSO进行数据处理的Python案例:
首先,我们需要导入所需的库和数据集。在本例中,我们将使用sklearn库中的Boston Housing数据集。
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
boston_data = load_boston()
X = pd.DataFrame(boston_data.data, columns=boston_data.feature_names)
y = pd.Series(boston_data.target)
```
接下来,我们将对数据进行标准化处理,并将其分成训练集和测试集。
```python
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
```
然后,我们将使用LASSO模型进行拟合,并计算其在测试集上的R²得分。
```python
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
train_score = lasso.score(X_train, y_train)
test_score = lasso.score(X_test, y_test)
print(f"Train score: {train_score:.3f}")
print(f"Test score: {test_score:.3f}")
```
最后,我们可以通过查看LASSO模型的系数来确定哪些特征对目标变量有重要影响。
```python
features = pd.DataFrame({'Feature': X.columns, 'Coefficient': lasso.coef_})
important_features = features[features['Coefficient'] != 0]
print(important_features)
```
完整代码如下:
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
boston_data = load_boston()
X = pd.DataFrame(boston_data.data, columns=boston_data.feature_names)
y = pd.Series(boston_data.target)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
train_score = lasso.score(X_train, y_train)
test_score = lasso.score(X_test, y_test)
print(f"Train score: {train_score:.3f}")
print(f"Test score: {test_score:.3f}")
features = pd.DataFrame({'Feature': X.columns, 'Coefficient': lasso.coef_})
important_features = features[features['Coefficient'] != 0]
print(important_features)
```
阅读全文