softmax中dk
时间: 2023-11-26 08:50:51 浏览: 38
在softmax中,dk表示一个注意力机制中的维度大小。通过除以dk,可以降低softmax函数在输入数据较大的情况下产生陡峭梯度的问题,使得梯度回传更加平稳。此外,当dk值较大时,输入数据的分布会更加集中,可能导致softmax函数输出值的两极分化状态。为了抵消这种影响,可以将点积缩放到1/√dk。<span class="em">1</span><span class="em">2</span><span class="em">3</span>
#### 引用[.reference_title]
- *1* *3* [Self-attention中为什么softmax要除d_k](https://blog.csdn.net/panxin801/article/details/120758904)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT0_1"}}] [.reference_item style="max-width: 50%"]
- *2* [attention is all you need](https://blog.csdn.net/qq_46539177/article/details/127740154)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT0_1"}}] [.reference_item style="max-width: 50%"]
[ .reference_list ]