Python线性回归分析通式
时间: 2023-07-05 17:23:34 浏览: 51
Python中进行线性回归分析的通式如下:
1. 导入必要的库和数据
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# 导入数据
data = pd.read_csv('data.csv')
```
2. 数据预处理
```python
# 处理缺失值
data = data.dropna()
# 分离自变量和因变量
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
# 分离训练集和测试集
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
```
3. 建立模型
```python
# 建立线性回归模型
regressor = LinearRegression()
# 训练模型
regressor.fit(X_train, y_train)
```
4. 预测结果
```python
# 预测测试集结果
y_pred = regressor.predict(X_test)
```
5. 可视化
```python
# 可视化训练集结果
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
# 可视化测试集结果
plt.scatter(X_test, y_test, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
```
以上是线性回归分析的通式,其中`data.csv`是存放数据的文件名,需要根据实际情况进行调整。