torch.distributed.elastic.multiprocessing.api:failed
时间: 2023-10-11 19:12:02 浏览: 4113
PyTorch Elastic :PyTorch分布式训练框架-python
torch.distributed.elastic.multiprocessing.api:failed报错是出现在使用分布式训练时的一个错误。这个错误的具体原因是在分布式训练过程中,同时使用了sampler和参数shuffle设置为True的dataloader,而这两者是相冲突的。在分布式训练中,sampler已经自动打乱了数据,所以dataloader无需再次打乱数据。因此,解决这个错误的方法是在dataloader时将参数shuffle设置为False即可。<span class="em">1</span><span class="em">2</span><span class="em">3</span>
#### 引用[.reference_title]
- *1* *2* [mmdetection3d报错问题解决汇总](https://blog.csdn.net/QLeelq/article/details/130404416)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"]
- *3* [【解决】pytorch单机多卡问题:ERROR: torch.distributed.elastic.multiprocessing.api:failed](https://blog.csdn.net/Caesar6666/article/details/126893353)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"]
[ .reference_list ]
阅读全文