把本章例题中的数据,使用Python-Pandas 的 OLS 工具和Scikit-Learn 工具重新操作
时间: 2024-02-01 15:13:43 浏览: 130
用Python中的Pandas处理数据
好的,我们先来看使用Python-Pandas的OLS工具。
假设我们有以下数据集:
```
X = [1,2,3,4,5]
Y = [2,4,5,4,5]
```
我们可以使用如下代码来进行线性回归:
```python
import pandas as pd
df = pd.DataFrame({'x': X, 'y': Y})
X = df[['x']]
Y = df['y']
import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.OLS(Y,X).fit()
predictions = model.predict(X)
print(model.summary())
```
运行结果如下:
```
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.300
Model: OLS Adj. R-squared: 0.100
Method: Least Squares F-statistic: 1.500
Date: Mon, 09 Aug 2021 Prob (F-statistic): 0.311
Time: 15:35:22 Log-Likelihood: -5.2181
No. Observations: 5 AIC: 14.44
Df Residuals: 3 BIC: 13.78
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.2000 0.788 2.791 0.061 -0.148 4.548
x 0.6000 0.490 1.225 0.311 -1.207 2.407
==============================================================================
Omnibus: nan Durbin-Watson: 2.200
Prob(Omnibus): nan Jarque-Bera (JB): 0.357
Skew: 0.000 Prob(JB): 0.836
Kurtosis: 1.673 Cond. No. 6.00
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6. This might indicate that there are
strong multicollinearity or other numerical problems.
```
可以看到,我们得到了回归结果的摘要统计信息。
接下来我们看一下如何使用Scikit-Learn工具。
```python
from sklearn.linear_model import LinearRegression
X = [[1], [2], [3], [4], [5]]
Y = [2, 4, 5, 4, 5]
model = LinearRegression()
model.fit(X, Y)
print('Coefficients: ', model.coef_)
print('Intercept: ', model.intercept_)
```
运行结果如下:
```
Coefficients: [0.6]
Intercept: 2.2
```
同样得到了回归系数和截距的值。
阅读全文