tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[200,1344,1344] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node model/multi_head_attention_1/multi_head_attention_1-Attention/ArithmeticOptimizer/FoldTransposeIntoMatMul_MatMul}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [Op:__inference_train_function_6511] Function call stack: train_function
时间: 2024-04-04 13:35:58 浏览: 285
这个错误是由于在GPU上分配的内存超出了设备的限制。建议采取以下几种方法之一:
1. 减少模型的复杂度,例如减少层数或减小每层的神经元数量。
2. 减少batch size,即每次传递的样本数量,以减少每次计算时所需的内存。
3. 更换更高内存的GPU设备。
4. 启用Tensorflow的GPU内存增长功能,可以按需分配内存。
同时,建议在运行模型时添加上面提到的 "report_tensor_allocations_upon_oom" 参数,以便更好地了解哪些张量导致了内存不足的问题。
相关问题
翻译 tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[60000,28,28,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [Op:BiasAdd] name: model/conv_1/BiasAdd/
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 在分配形状为[60000,28,28,32]和类型为float的张量时,由于分配器CPU内存不足而发生OOM错误。 [Op:BiasAdd] 名称:model/conv_1/BiasAdd/。
tensorflow.python.framework.errors_impl.resourceexhaustederror: oom when all
available memory is already in use
This error occurs when TensorFlow is unable to allocate enough memory to complete a computation. This can happen when the input data or the model parameters are too large to fit into the available memory. It can also happen if the computation requires more memory than the system has available.
To resolve this issue, you can try the following steps:
1. Reduce the batch size: If you are using batch processing, try reducing the batch size to a smaller value. This will reduce the amount of memory required to process each batch.
2. Reduce the size of the input data: If the input data is too large, you can try reducing its size by downsampling or cropping the data.
3. Use a smaller model: If the model is too large, try using a smaller model. You can also try reducing the number of layers or the number of neurons in each layer.
4. Use a GPU with more memory: If you are using a GPU, try upgrading to a GPU with more memory.
5. Use distributed training: If you have multiple GPUs or machines available, you can try using distributed training to spread the computation across multiple devices.
6. Clear memory: You can also try clearing the memory by restarting the kernel or using the TensorFlow function `tf.reset_default_graph()`.
阅读全文