可变形卷积DCNv4
时间: 2024-12-28 21:22:49 浏览: 16
### 可变形卷积 DCNv4 的实现与应用
#### 实现细节
Deformable Convolution v4 (DCNv4) 是一种高效且稀疏的动态运算符,通过重新思考变形卷积的动态特性并优化内存访问模式来提升性能[^1]。具体来说,DCNv4 改进了之前的版本,在保持灵活性的同时提高了计算效率。
以下是 Python 和 PyTorch 中的一个简化版 DCNv4 实现:
```python
import torch
from torch import nn
import torch.nn.functional as F
class DeformConv2d_v4(nn.Module):
def __init__(self, inc, outc, kernel_size=3, padding=1, stride=1, bias=None, modulation=False):
super(DeformConv2d_v4, self).__init__()
self.kernel_size = kernel_size
self.padding = padding
self.stride = stride
# Offsets and modulations are learned parameters
self.offset_conv = nn.Conv2d(inc, 2 * kernel_size ** 2, kernel_size=kernel_size,
stride=stride, padding=self.padding, bias=True)
nn.init.constant_(self.offset_conv.weight, 0.)
nn.init.constant_(self.offset_conv.bias, 0.)
self.modulation = modulation
if modulation:
self.m_conv = nn.Conv2d(inc, kernel_size ** 2, kernel_size=kernel_size,
stride=stride, padding=self.padding, bias=True)
nn.init.constant_(self.m_conv.weight, 0.)
nn.init.constant_(self.m_conv.bias, 0.)
self.regular_conv = nn.Conv2d(inc, outc, kernel_size=kernel_size,
stride=stride, padding=padding, bias=bias)
def forward(self, x):
offset = self.offset_conv(x)
if self.modulation:
m = torch.sigmoid(self.m_conv(x))
dtype = offset.data.type()
ks = self.kernel_size
N = offset.size(1) // 2
if self.modulation:
m = m.contiguous().view(-1, N, h, w)
offset_ = offset.clone()
offset_[:, :N, :, :] *= m if self.modulation else 1.
offset_[:, N:, :, :] *= m if self.modulation else 1.
p = self._get_p(offset_, dtype).permute(0, 2, 3, 1)
q_lt = p.detach().floor() # Quantized top-left corner coordinates
q_rb = q_lt + 1 # Bottom-right corners
g_lt = (1 + (q_lt[..., 0] - p[..., 0]) + (q_lt[..., 1] - p[..., 1])).clamp(min=0, max=1)
g_rb = (1 - (q_rb[..., 0] - p[..., 0]) - (q_rb[..., 1] - p[..., 1])).clamp(min=0, max=1)
g_lb = (1 + (q_rb[..., 0] - p[..., 0]) - (q_lt[..., 1] - p[..., 1])).clamp(min=0, max=1)
g_rt = (1 - (q_lt[..., 0] - p[..., 0]) + (q_rb[..., 1] - p[..., 1])).clamp(min=0, max=1)
x_q_lt = bilinear_interpolate_torch(x, q_lt[..., 0], q_lt[..., 1])
x_q_rb = bilinear_interpolate_torch(x, q_rb[..., 0], q_rb[..., 1])
x_q_lb = bilinear_interpolate_torch(x, q_rb[..., 0], q_lt[..., 1])
x_q_rt = bilinear_interpolate_torch(x, q_lt[..., 0], q_rb[..., 1])
x_offset = (
g_lt.unsqueeze(dim=-1) * x_q_lt +
g_rb.unsqueeze(dim=-1) * x_q_rb +
g_lb.unsqueeze(dim=-1) * x_q_lb +
g_rt.unsqueeze(dim=-1) * x_q_rt
)
output = self.regular_conv(x_offset.view(batch_size, group_channels, height, width))
return output
def bilinear_interpolate_torch(im, y, x):
"""Bilinear interpolation function"""
x0 = torch.floor(x).long()
x1 = x0 + 1
y0 = torch.floor(y).long()
y1 = y0 + 1
wa = (x1-x) * (y1-y)
wb = (x1-x) * (y-y0)
wc = (x-x0) * (y1-y)
wd = (x-x0) * (y-y0)
Ia = im[:, range(im.shape[1]), y0.clamp_as(y), x0.clamp_as(x)]
Ib = im[:, range(im.shape[1]), y1.clamp_as(y), x0.clamp_as(x)]
Ic = im[:, range(im.shape[1]), y0.clamp_as(y), x1.clamp_as(x)]
Id = im[:, range(im.shape[1]), y1.clamp_as(y), x1.clamp_as(x)]
return wa*Ia + wb*Ib + wc*Ic + wd*Id
```
此代码展示了如何构建一个基于 PyTorch 的可变形卷积层,并利用双线性插值方法处理偏移量带来的非整数位置采样问题。
#### 应用场景
在计算机视觉领域内,尤其是对于细粒度动作检测的任务中,局部一致性的可变形卷积网络被证明能够有效地捕捉到目标物体的关键部位变化,从而提高识别精度[^2]。此外,由于其能够在特征空间学习运动信息的能力,使得该技术非常适合用于视频分析中的时空建模任务。
阅读全文