能否提供一个使用Jupyter Notebook实现的泰坦尼克号乘客生存预测的Pytorch代码示例?
时间: 2024-06-11 20:03:36 浏览: 118
利用Python对泰坦尼克号乘客的存活率进行了综合分析
当然,我可以为你提供一个简单的泰坦尼克号乘客生存预测的PyTorch代码示例,我们将使用Jupyter Notebook进行实现。首先,确保你已经安装了必要的库,如`pandas`, `numpy`, `torch`, `matplotlib`和`sklearn`。在Jupyter Notebook中,你可以按照以下步骤编写代码:
```python
# 导入所需的库
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
# 加载数据
data_url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
titanic_data = pd.read_csv(data_url)
# 数据预处理
titanic_data = titanic_data.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1) # 删除无用列
titanic_data['Age'].fillna(titanic_data['Age'].mean(), inplace=True) # 填充缺失年龄
titanic_data['Embarked'].fillna(titanic_data['Embarked'].mode(), inplace=True) # 填充缺失登船港口
titanic_data = pd.get_dummies(titanic_data, columns=['Sex', 'Embarked']) # 处理类别特征
# 定义特征和目标变量
features = titanic_data.drop('Survived', axis=1)
target = titanic_data['Survived']
# 将数据划分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)
# 数据标准化
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# 定义模型结构
class TitanicSurvivalModel(nn.Module):
def __init__(self):
super(TitanicSurvivalModel, self).__init__()
self.fc1 = nn.Linear(X_train.shape, 64) # 隐藏层
self.relu = nn.ReLU()
self.dropout = nn.Dropout(0.2)
self.fc2 = nn.Linear(64, 1) # 输出层
def forward(self, x):
out = self.fc1(x)
out = self.relu(out)
out = self.dropout(out)
out = self.fc2(out)
return out
model = TitanicSurvivalModel()
# 定义损失函数和优化器
criterion = nn.BCEWithLogitsLoss() # 对于二分类任务,我们通常使用BCELoss
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 训练模型
num_epochs = 10
for epoch in range(num_epochs):
model.train()
optimizer.zero_grad()
outputs = model(X_train)
loss = criterion(outputs, y_train.view(-1, 1)) # 将目标转换为单通道
loss.backward()
optimizer.step()
with torch.no_grad():
model.eval()
train_loss = criterion(outputs, y_train.view(-1, 1))
train_accuracy = (outputs.sigmoid() > 0.5).eq(y_train).sum().item() / y_train.shape
print(f'Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss.item():.4f}, Train Accuracy: {train_accuracy * 100:.2f}%')
# 测试模型
test_outputs = model(X_test)
test_predictions = torch.sigmoid(test_outputs) > 0.5
test_accuracy = (test_predictions == y_test).sum().item() / y_test.shape
print(f'Test Accuracy: {test_accuracy * 100:.2f}%')
# 保存模型
torch.save(model.state_dict(), 'titanic_survival_model.pth')
```
在这个例子中,我们首先加载数据、预处理并划分训练集和测试集。然后定义了一个简单的神经网络模型,训练并评估它。最后,我们将模型保存以便后续使用。
阅读全文