CUDA DETAIL
时间: 2023-11-06 08:21:52 浏览: 77
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during the training loop. 2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
这个错误是由以下原因之一引起的:1) 在`forward`函数之外使用了模型参数。请确保模型参数在多个并发的前向和反向传播过程中没有被共享。如果模型图在训练循环中不会改变,可以尝试使用`_set_static_graph()`作为解决方法。2) 在多个可重新进入的反向传播过程中重复使用参数。例如,如果你使用多个`checkpoint`函数来包装模型的同一部分,那么相同的参数会被不同的可重新进入的反向传播过程多次使用,从而导致一个变量被标记多次为已准备好。DDP在默认情况下不支持这种用例。如果你的模块图在迭代过程中不会改变,可以尝试使用`_set_static_graph()`作为解决方法。<span class="em">1</span><span class="em">2</span><span class="em">3</span>
阅读全文