python实现多几多卡跑一个简单的demo
时间: 2023-10-02 20:14:11 浏览: 114
如果您想在多卡环境下运行Python代码,可以使用深度学习框架如Pytorch、TensorFlow等来实现。以下是使用Pytorch实现多卡训练的简单示例:
1. 安装Pytorch和torchvision
```bash
pip install torch torchvision
```
2. 编写多卡训练代码
```python
import torch
import torch.nn as nn
import torch.optim as optim
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP
def setup(rank, world_size):
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12355'
# 初始化进程组
dist.init_process_group("nccl", rank=rank, world_size=world_size)
def cleanup():
dist.destroy_process_group()
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.fc1 = nn.Linear(10, 5)
self.fc2 = nn.Linear(5, 1)
def forward(self, x):
x = self.fc1(x)
x = nn.functional.relu(x)
x = self.fc2(x)
return x
def demo_basic(rank, world_size):
setup(rank, world_size)
# 将模型放到指定设备上
device = torch.device("cuda:%d" % rank)
model = SimpleNet().to(device)
# 将模型包装成分布式数据并行模型
ddp_model = DDP(model, device_ids=[rank])
# 定义数据和优化器
data = torch.randn(20, 10).to(device)
optimizer = optim.SGD(ddp_model.parameters(), lr=0.001)
for i in range(1000):
optimizer.zero_grad()
output = ddp_model(data)
loss = output.mean()
loss.backward()
optimizer.step()
if i % 100 == 0:
print('Rank', rank, 'iter', i, 'loss', loss.item())
cleanup()
if __name__ == '__main__':
# 启动多进程训练
world_size = 2
mp.spawn(demo_basic, args=(world_size,), nprocs=world_size, join=True)
```
这个简单的demo使用了Pytorch的分布式数据并行模型(`DistributedDataParallel`),用于将模型和数据分布到多个GPU上并进行并行计算。在多卡训练中,我们需要使用`torch.distributed`模块来初始化进程组,以便在多个进程之间进行通信和同步。在上面的示例中,我们使用了`nccl`通信后端进行进程间通信。
阅读全文