可变形卷积多头注意力
时间: 2025-01-02 18:33:56 浏览: 9
### 可变形卷积与多头注意力机制
#### 可变形卷积网络 (Deformable Convolutional Networks)
可变形卷积是一种改进标准卷积操作的方法,旨在增强模型捕捉空间变换的能力。传统卷积核具有固定的几何结构,在处理图像中的尺度变化、旋转和平移等问题上存在局限性。
为了克服这些限制,可变形卷积引入了一组额外的学习参数——偏移量(offsets),用于调整采样位置。具体来说,对于每个像素点\( p_n \),计算其周围邻居的位置偏差并应用到输入特征图上:
\[p_k = p_0 + \sum_{i=1}^{N}\Delta p_i\]
其中 \(p_0\) 表示中心坐标,而 \(\Delta p_i\) 则是由神经网络预测得到的位移向量[^1]。
这种设计允许卷积滤波器自适应地改变形状以匹配目标物体的实际形态,从而提高了检测精度和鲁棒性。
```python
import torch.nn as nn
class DeformConv2d(nn.Module):
def __init__(self, inc, outc, kernel_size=3, stride=1, padding=1, bias=None):
super(DeformConv2d, self).__init__()
self.kernel_size = kernel_size
N = kernel_size * kernel_size
# Offsets prediction layer
self.offset_conv = nn.Conv2d(
inc,
2*N,
kernel_size=kernel_size,
stride=stride,
padding=padding,
bias=bias
)
# Regular conv with fixed weights initialized to identity transformation
self.regular_conv = nn.Conv2d(
inc,
outc,
kernel_size=kernel_size,
stride=stride,
padding=padding,
bias=bias
)
def forward(self, x):
offset = self.offset_conv(x)
output = deform_conv_function(x, offset, weight=self.regular_conv.weight)
return output
```
#### 多头注意力机制(Multi-Head Attention Mechanism)
多头注意力机制源自Transformer架构,通过多个平行运行的关注函数来捕获不同子空间内的依赖关系。相比于单头设置,这种方法能够更有效地提取复杂模式,并且有助于缓解梯度消失问题。
在一个典型的实现中,查询(Query)、键(Keys) 和 值(Values) 都会被映射成低维表示形式,之后再经过缩放点乘法获得权重矩阵Wqk:
\[Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d}})V\]
这里 d 是 Key 的维度大小;最后将所有头部的结果拼接起来并通过线性转换层输出最终结果[^2]。
```python
import math
import torch
from torch import Tensor
import torch.nn.functional as F
def scaled_dot_product_attention(query: Tensor, key: Tensor, value: Tensor) -> Tensor:
dim_key = query.size(-1)
scores = torch.bmm(query, key.transpose(1, 2)) / math.sqrt(dim_key)
attn_weights = F.softmax(scores, dim=-1)
context_vector = torch.bmm(attn_weights, value)
return context_vector
class MultiheadAttention(nn.Module):
def __init__(self, embed_dim, num_heads):
super().__init__()
head_dim = embed_dim // num_heads
assert (
head_dim * num_heads == embed_dim
), "Embedding dimension must be divisible by number of heads"
self.q_proj_weight = Parameter(torch.empty((embed_dim, embed_dim)))
self.k_proj_weight = Parameter(torch.empty((embed_dim, embed_dim)))
self.v_proj_weight = Parameter(torch.empty((embed_dim, embed_dim)))
self.out_proj = Linear(embed_dim, embed_dim)
def forward(self, query: Tensor, key: Tensor, value: Tensor) -> Tensor:
batch_size, seq_len_q, _ = query.shape
_, seq_len_k, _ = key.shape
_, seq_len_v, _ = value.shape
q = F.linear(query, self.q_proj_weight).view(batch_size, seq_len_q, -1, self.num_heads).transpose(1, 2)
k = F.linear(key, self.k_proj_weight).view(batch_size, seq_len_k, -1, self.num_heads).transpose(1, 2)
v = F.linear(value, self.v_proj_weight).view(batch_size, seq_len_v, -1, self.num_heads).transpose(1, 2)
attended_values = []
for i in range(self.num_heads):
context_vector = scaled_dot_product_attention(q[:, i], k[:, i], v[:, i])
attended_values.append(context_vector.unsqueeze(1))
concat_context_vectors = torch.cat(attended_values, dim=1)
multi_head_output = concat_context_vectors.view(batch_size, seq_len_q, -1)
final_output = self.out_proj(multi_head_output)
return final_output
```
阅读全文