Soft-attention 代码
时间: 2024-09-20 20:16:34 浏览: 46
Soft-attention是一种在深度学习特别是自然语言处理任务中常用的技术,它允许模型在处理输入序列时,对每个位置赋予不同的权重,重点聚焦于最重要的部分。以下是基于Transformer架构的一个简化的Soft-attention代码片段(假设使用PyTorch):
```python
import torch
from torch import nn
class MultiHeadAttention(nn.Module):
def __init__(self, embed_dim, num_heads):
super(MultiHeadAttention, self).__init__()
self.embed_dim = embed_dim
self.num_heads = num_heads
self.head_dim = embed_dim // num_heads
self.fc_q = nn.Linear(embed_dim, embed_dim) # Query linear layer
self.fc_k = nn.Linear(embed_dim, embed_dim) # Key linear layer
self.fc_v = nn.Linear(embed_dim, embed_dim) # Value linear layer
self.fc_out = nn.Linear(embed_dim, embed_dim)
def attention(self, q, k, v):
scores = torch.matmul(q, k.transpose(-2, -1)) / (self.head_dim ** 0.5)
attn_weights = nn.functional.softmax(scores, dim=-1)
output = torch.matmul(attn_weights, v)
return output, attn_weights
def forward(self, query, key, value):
batch_size = query.size(0)
# Linear projections
q = self.fc_q(query).view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2) # (batch_size, heads, seq_len, head_dim)
k = self.fc_k(key).view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
v = self.fc_v(value).view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
# Attention and concatenation
outputs, attn_weights = self.attention(q, k, v)
outputs = outputs.transpose(1, 2).contiguous().view(batch_size, -1, self.embed_dim)
final_output = self.fc_out(outputs)
return final_output, attn_weights
```
这个代码定义了一个基本的多头注意力模块,其中`query`, `key`, 和 `value`分别代表输入序列的不同版本。`forward`函数执行了整个注意力过程。
阅读全文