全卷积暹罗网络：提升实时对象跟踪性能

72 浏览量更新于2024-06-15 收藏 1.73MB PDF 举报

本文主要探讨了一种创新的基于Object-C的目标跟踪方法，即全卷积暹罗网络（Fully-Convolutional Siamese Networks, FC-Siamese）。传统的对象跟踪问题通常依赖于在线学习，只使用视频本身作为训练数据，这限制了模型的复杂性和适应性。作者们注意到，虽然深度卷积网络具有强大的表达能力，但当需要追踪的目标未知时，为了实时调整网络权重，采用随机梯度下降会导致系统速度大幅下降。 FC-Siamese网络作为一种新颖的解决方案，旨在克服这一局限。它是一种端到端训练的架构，借鉴了Siamese网络的思想，即两个相同的神经网络同时处理输入的两帧图像，从而捕捉目标的相似性或变化。这种设计允许网络在整个视频序列中学习，无需在每次新帧出现时都需要重新训练，大大提高了效率。在ILSVRC15数据集上进行训练，FC-Siamese网络特别用于视频中的对象检测任务，它不仅提供了高效的实时性能，而且即使在极其简单的设置下，也能在多个基准测试中展现出先进的跟踪效果。这种技术的优势在于它能够处理复杂的场景和动态变化，而无需预先知道目标，这对于实时应用场景如自动驾驶、视频监控和运动分析等具有显著的价值。本文的贡献在于提出了一种结合了深度学习和卷积神经网络的高效目标跟踪策略，它通过全卷积架构实现了在线学习的灵活性和速度的提升，使得目标跟踪在实际应用中更加可靠和实时。这种技术的发展对于推动计算机视觉领域，特别是目标跟踪技术的发展具有重要意义。

4 L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. S. Torr

where b

denotes a signal which takes value b ∈ R in every location. The output

of this network is not a single score but rather a score map deﬁned on a ﬁnite

grid D ⊂ Z

as illustrated in Figure 1. Note that the output of the embedding

function is a feature map with spatial support as opposed to a plain vector. The

same technique has been applied in contemporary work on stereo matching [23].

During tracking, we use a search image centred at the previous position of

the target. The position of the maximum score relative to the centre of the

score map, multiplied by the stride of the network, gives the displacement of the

target from frame to frame. Multiple scales are searched in a single forward-pass

by assembling a mini-batch of scaled images.

Combining feature maps using cross-correlation and evaluating the network

once on the larger search image is mathematically equivalent to combining fea-

ture maps using the inner product and evaluating the network on each translated

sub-window independently. However, the cross-correlation layer provides an in-

credibly simple method to implement this operation eﬃciently within the frame-

work of existing conv-net libraries. While this is clearly useful during testing, it

can also be exploited during training.

2.2 Training with large search images

We employ a discriminative approach, training the network on positive and

negative pairs and adopting the logistic loss

`(y, v) = log(1 + exp(−yv)) (3)

where v is the real-valued score of a single exemplar-candidate pair and y ∈

{+1, −1} is its ground-truth label. We exploit the fully-convolutional nature of

our network during training by using pairs that comprise an exemplar image and

a larger search image. This will produce a map of scores v : D → R, eﬀectively

generating many examples per pair. We deﬁne the loss of a score map to be the

mean of the individual losses

L(y, v) =

|D|

u∈D

`(y[u], v[u]) , (4)

requiring a true label y[u] ∈ {+1, −1} for each position u ∈ D in the score map.

The parameters of the conv-net θ are obtained by applying Stochastic Gradient

Descent (SGD) to the problem

arg min

(z,x,y)

L(y, f(z, x; θ)) . (5)

Pairs are obtained from a dataset of annotated videos by extracting exemplar

and search images that are centred on the target, as shown in Figure 2. The

images are extracted from two frames of a video that both contain the object

and are at most T frames apart. The class of the object is ignored during training.

The scale of the object within each image is normalized without corrupting the

剩余15页未读，继续阅读

初心不忘产学研

粉丝: 9688
资源: 240

全卷积暹罗网络：提升实时对象跟踪性能

基于全卷积Fully-Convolutional-Siamese-Networks的目标跟踪仿真+word版说明文档

siamese-fc：使用完全卷积暹罗网络以50-100 FPS进行任意对象跟踪

改进的卷积网络目标跟踪算法

Offline-Signature-Verification-using-Siamese-Network:使用在Keras中实现的卷积暹罗网络识别伪造签名

metric-learning-siamese-nn:公制学习暹罗

matlab最简单的代码-siamese-mnist:MNISTMatConvNet的暹罗示例

Semi-Siamese-Training:“半暹罗浅脸学习培训”

siamese-networks-omniglot-pytorch:使用PyTorch实施暹罗网络

chainer-siamese:使用Chainer的暹罗网络实施

Siamese-ResNet：基于暹罗网络实现环路闭合检测

最新资源