使用Python中的Scikit-learn库或者StatsModels库来训练模型、计算均方误差和决定系数R2
时间: 2024-03-01 20:55:17 浏览: 23
好的,假设您已经有了数据集并且将其分成了训练集和测试集,您可以按照以下步骤使用Scikit-learn库来训练模型并计算均方误差和决定系数R2:
1. 导入所需的库和数据集
```python
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
data = pd.read_csv('your_dataset.csv')
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
```
2. 将数据集分成训练集和测试集
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
```
3. 训练线性回归模型
```python
regressor = LinearRegression()
regressor.fit(X_train, y_train)
```
4. 对测试集进行预测,并计算均方误差和决定系数R2
```python
y_pred = regressor.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print('均方误差:', mse)
print('决定系数R2:', r2)
```
如果您想使用StatsModels库来训练模型并计算均方误差和决定系数R2,可以按照以下步骤:
1. 导入所需的库和数据集
```python
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
data = pd.read_csv('your_dataset.csv')
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
```
2. 将数据集分成训练集和测试集
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
```
3. 训练线性回归模型
```python
X_train = sm.add_constant(X_train)
model = sm.OLS(y_train, X_train)
regressor = model.fit()
```
4. 对测试集进行预测,并计算均方误差和决定系数R2
```python
X_test = sm.add_constant(X_test)
y_pred = regressor.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print('均方误差:', mse)
print('决定系数R2:', r2)
```
在得到均方误差和决定系数R2之后,您可以使用Matplotlib库来绘制残差图。