RGB-D视频对象分割：循环卷积神经网络的应用

版权申诉

71 浏览量更新于2024-07-19 收藏 5.17MB PDF 举报

"该资源是一篇关于使用循环卷积神经网络进行RGB-D视频对象分割的研究论文，由Mircea Serban Pavel、Hannes Schulz和Sven Behnke等人撰写，来自德国波恩大学计算机科学研究所。该研究探讨了深度卷积神经网络（DNN）在处理对象类分割任务中的局限性，并提出了一种新的循环神经网络架构来解决长期依赖问题，尤其适用于处理视频序列中的空间和时间长期依赖关系。" 正文: 基于循环卷积神经网络的RGB-D视频对象分割是计算机视觉领域的一个重要任务，它涉及到对每一帧图像的每个像素进行分类，标识出它所属的对象类别。传统的深度卷积神经网络(DNN)因其强大的特征学习能力，在图像分析任务中表现出色，能够捕获局部空间相关性。然而，DNN的固定大小滤波器限制了其学习长期依赖关系的能力，这在处理具有时空连续性的视频数据时成为一个显著的挑战。另一方面，循环神经网络(RNN)通过其迭代解释机制，可以有效地建模和传播活动，从而处理长距离依赖问题。这种特性使得RNN在处理序列数据，如视频，时特别有优势，因为视频中同时存在空间和时间上的长期依赖关系。在该研究中，作者提出了一种新颖的RNN架构，专门针对RGB-D视频对象分割任务。RGB-D视频提供了颜色（RGB）和深度（D）两个维度的信息，这为更精确的分割提供了可能。通过结合这两种信息，网络可以更好地理解场景的三维结构，进一步提高分割效果。研究者探究了几种不同的方法，包括如何将卷积操作与循环操作相结合，以及如何利用深度信息来增强网络的性能。他们可能还讨论了训练策略，如反向传播算法的优化和损失函数的选择，以确保网络能够有效地捕捉到视频序列中的时空动态变化。此外，论文可能还包括实验部分，其中展示了新模型在标准数据集上的性能，与其他现有方法进行了比较，证明了所提方法的有效性和优越性。通过这些实验，读者可以了解该模型在实际应用中的表现，以及在不同条件下的适应性。这篇论文为RGB-D视频对象分割提供了一个创新的解决方案，利用循环卷积神经网络克服了传统DNN的局限，增强了处理时空依赖的能力。这对于实时监控、自动驾驶、机器人导航等需要理解和分割复杂动态环境的应用具有重要的意义。

combined with the image at time t, producing an output and a new state. Since

the last output beneﬁts from learning from the whole sequence, it is natural to

place the frame that we want to evaluate at the end.

The ﬁrst temporal copy is special, since it contains regular feed-forward

connections. This allows us to produce activations in each layer such that all

connection types can be used in the transition from t to t + 1.

Network Depth. When processing input at time t, we allow L −1 time steps for

the information to reach the top level of the network and the same amount for

propagating back to the bottom layer, where the output corresponding to time

t is evaluated. Note that the last temporal steps do not need all the hidden

layers, since their activation would no longer propagate to the output.

Our RNN is trained with backpropagation through time (BPTT), and can

be interpreted as a very deep non-recurrent net after unfolding in time. In

this non-recurrent network, multiple paths lead to the output, with the shortest

path — from input t = T to the ﬁnal output — having only length 2L − 1, and

the longest 2L + t, which amounts to a depth of 14 layers for our L = 3, T = 8

network.

Weight Initialization and Optimization. We initialize the weights and biases

from a Gaussian distribution. It is important to ensure that the activations

do not explode or vanish early during training. Ideally, activations in the ﬁrst

forward pass should have similar magnitudes. This is diﬃcult to control, how-

ever. Instead, we choose the standard deviation of the weights for each layer l

according to the scheme proposed by He et al. (2015):

σ =

· d

l−1

, (1)

which takes into account the ﬁlter size k

and the number of ﬁlters of the last

layer d

l−1

. We determine the mean of the bias such that the average of the

activations in every point of our network is positive and slightly decreasing over

time. Liang and Hu (2015) use local contrast normalization at all layers to the

same eﬀect, which requires more GPU memory for the hidden layer activations.

Due to our larger inputs and outputs and the increased number of time steps,

current GPU memory restrictions prevent us from doing the same.

We learn the parameters of our network with backpropagation through

time (BPTT) using RMSProp, which is a variant of resilient backpropagation

(RPROP, Riedmiller and Braun 1993) suitable for mini-batch learning (Dauphin

et al., 2015). RPROP and RMSProp to a large degree consider only the sign of

the gradient, thus being robust against vanishing and exploding gradients, both

common phenomena in RNN training.

During learning, we apply dropout (Srivastava et al., 2014). Combining

dropout with RNNs is delicate, however. If it aﬀects recurrent connections,

their ability to learn long-range dependencies suﬀers (Pham et al., 2014). Thus,

we apply dropout only to the ﬁnal convolution with non-shared weights that

剩余20页未读，继续阅读

Fun_He

粉丝: 19
资源: 104

RGB-D视频对象分割：循环卷积神经网络的应用

三维循环密集卷积神经网络在视频手势识别的应用.pdf

6DPose综述1.pdf

Halcon有关图像通道的函数.pdf

Deep Learning based 3D Segmentation A Survey.pdf

深度学习实例分割：3D-BoNet.zip

【图像分割】基于meanshift实现色盲图像分割含Matlab源码.zip

基于MATLAB的字符识别及其在化探野外资料整理中的应用.rar

数字图像处理与机器视觉原书pdf版

特征提取与图像处理（第二版）pdf

opencv优质资源：OpenCV算法精解：基于Python与C

最新资源