pytorch 单机单卡的脚本怎么修改为单机多卡?
时间: 2024-05-07 19:23:07 浏览: 10
要将 PyTorch 单机单卡的脚本修改为单机多卡,需要执行以下步骤:
1. 导入必要的库和设定参数
```python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# 设定参数
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
batch_size = 64
epochs = 10
learning_rate = 0.01
```
2. 加载数据集并进行数据增广
```python
# 加载数据集
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())
# 数据增广
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True)
```
3. 定义模型和损失函数
```python
# 定义模型
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(nn.functional.relu(self.conv1(x)))
x = self.pool(nn.functional.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = nn.functional.relu(self.fc1(x))
x = nn.functional.relu(self.fc2(x))
x = self.fc3(x)
return x
# 定义损失函数
criterion = nn.CrossEntropyLoss()
```
4. 初始化模型和优化器
```python
# 初始化模型和优化器
model = Net().to(device)
if torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
```
5. 训练模型
```python
# 训练模型
for epoch in range(epochs):
model.train()
train_loss = 0.0
for i, (inputs, labels) in enumerate(train_loader):
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss += loss.item() * inputs.size(0)
train_loss /= len(train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch+1, train_loss))
```
6. 测试模型
```python
# 测试模型
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print('Accuracy: %.2f%%' % accuracy)
```
在第四步中,我们使用了 `nn.DataParallel` 函数来将模型在多个 GPU 上并行计算。当有多个 GPU 时,PyTorch 会自动将 batch 拆分到不同的 GPU 上进行计算,然后将结果合并。在使用 `nn.DataParallel` 时,需要将模型包装在 `nn.DataParallel` 中,然后将其移到 GPU 上。如果只有一个 GPU,则不需要使用 `nn.DataParallel`。