pytoch实现kaggle线性回归房价预测
时间: 2023-07-07 19:15:53 浏览: 187
好的,以下是一个使用 PyTorch 实现 Kaggle 房价预测比赛中的线性回归模型的示例代码:
```
import pandas as pd
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split
# 加载数据集
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
# 数据预处理
train_df.fillna(0, inplace=True)
test_df.fillna(0, inplace=True)
X = train_df.drop(['Id', 'SalePrice'], axis=1).values
y = train_df['SalePrice'].values
# 划分训练集和验证集
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# 转换为 PyTorch 张量
X_train = torch.tensor(X_train, dtype=torch.float32)
X_val = torch.tensor(X_val, dtype=torch.float32)
y_train = torch.tensor(y_train.reshape(-1, 1), dtype=torch.float32)
y_val = torch.tensor(y_val.reshape(-1, 1), dtype=torch.float32)
# 定义模型
class LinearRegression(nn.Module):
def __init__(self, input_dim, output_dim):
super(LinearRegression, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self, x):
out = self.linear(x)
return out
model = LinearRegression(X_train.shape[1], 1)
# 定义优化器和损失函数
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# 训练模型
num_epochs = 1000
for epoch in range(num_epochs):
# 前向传播和计算损失
outputs = model(X_train)
loss = criterion(outputs, y_train)
# 反向传播和优化
optimizer.zero_grad()
loss.backward()
optimizer.step()
# 每 100 次迭代输出一次损失
if (epoch+1) % 100 == 0:
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
# 在验证集上测试模型
with torch.no_grad():
predicted = model(X_val)
mse = criterion(predicted, y_val)
print('Mean Squared Error on Validation Set: {:.4f}'.format(mse.item()))
# 在测试集上生成预测结果
X_test = test_df.drop(['Id'], axis=1).values
X_test = torch.tensor(X_test, dtype=torch.float32)
with torch.no_grad():
predicted = model(X_test)
predicted = predicted.numpy().reshape(-1)
# 生成提交文件
submission_df = pd.DataFrame({'Id': test_df['Id'], 'SalePrice': predicted})
submission_df.to_csv('submission.csv', index=False)
```
这个代码使用 PyTorch 的 nn 模块定义了一个简单的线性回归模型,然后使用 SGD 优化器和 MSE 损失函数训练模型,并在验证集上评估模型性能。最后,使用训练好的模型在测试集上生成预测结果,然后将结果保存到 CSV 文件中用于提交比赛。
阅读全文