torch.distributed.elastic
时间: 2023-10-15 16:23:43 浏览: 52
引用中提到了PyTorch Elastic,它是一个分布式训练框架,允许以容错和弹性的方式启动分布式PyTorch作业。它需要使用torchelastic库,并且需要Python3和torch等其他依赖项。你提到的错误信息是torch.distributed.elastic.multiprocessing.errors.ChildFailedError,这是一个在PyTorch Elastic中可能出现的错误。<span class="em">1</span><span class="em">2</span><span class="em">3</span>
#### 引用[.reference_title]
- *1* [PyTorch Elastic :PyTorch分布式训练框架-python](https://download.csdn.net/download/weixin_42168341/19718612)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 33.333333333333336%"]
- *2* [pytorch多GPU并行的问题](https://blog.csdn.net/rucieryi369/article/details/124703773)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 33.333333333333336%"]
- *3* [mmdetection3d报错问题解决汇总](https://blog.csdn.net/QLeelq/article/details/130404416)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 33.333333333333336%"]
[ .reference_list ]