Multi-head attention
时间: 2023-08-24 10:06:41 浏览: 186
Multi-head attention是一种注意力机制,它在Transformer模型中被引入。它可以看作是多个self-attention的组合,类似于CNN中的多核。不同于循环计算每个头,multi-head attention使用矩阵乘法来实现。它的计算流程可以通过转置和重塑来完成。使用多头注意力机制可以使模型同时关注来自不同表示子空间和不同位置的信息,从而提高模型的表达能力。理解self-attention的本质实际上就是了解multi-head attention结构。\[1\]\[2\]\[3\]
#### 引用[.reference_title]
- *1* [自注意力(Self-Attention)与Multi-Head Attention机制详解](https://blog.csdn.net/weixin_60737527/article/details/127141542)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item]
- *2* [Multi-Head Attention的讲解](https://blog.csdn.net/qq_41980734/article/details/120842437)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item]
- *3* [详解Transformer中Self-Attention以及Multi-Head Attention](https://blog.csdn.net/qq_37541097/article/details/117691873)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item]
[ .reference_list ]
阅读全文