不使用sklearn 运用python对Advertising数据(广告投入与销售额数据)进行线性回归
时间: 2024-03-25 16:39:10 浏览: 126
好的,针对您的问题,我可以给出以下代码实现:
首先,我们需要导入必要的库:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
```
接着,我们读入数据集:
```python
df = pd.read_csv("Advertising.csv")
```
然后,我们需要将数据分为特征和标签,即广告投入和销售额:
```python
X = df.iloc[:, 1:4].values
y = df.iloc[:, 4].values
```
接下来,我们需要将数据集分为训练集和测试集:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
```
然后,我们需要对特征进行标准化处理:
```python
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```
接着,我们可以使用线性回归模型对数据进行拟合:
```python
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
```
最后,我们可以进行预测并计算误差:
```python
y_pred = regressor.predict(X_test)
from sklearn.metrics import mean_squared_error
print("Mean squared error: ", mean_squared_error(y_test, y_pred))
```
完整代码如下:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("Advertising.csv")
X = df.iloc[:, 1:4].values
y = df.iloc[:, 4].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
from sklearn.metrics import mean_squared_error
print("Mean squared error: ", mean_squared_error(y_test, y_pred))
```
希望这可以帮到您!
阅读全文