使用循环卷积神经网络进行RGB-D视频对象类分割

版权申诉

76 浏览量更新于2024-07-21 收藏 5.15MB PDF 举报

"该资源是一篇关于使用循环卷积神经网络进行RGB-D视频对象类分割的研究论文，由Mircea Serban Pavel、Hannes Schulz和Sven Behnke撰写，来自波恩大学计算机科学研究所。文章探讨了如何利用深度神经网络处理视频序列中的像素级分类任务，特别是解决传统卷积神经网络在捕捉长距离依赖性方面的局限性，通过引入循环神经网络来增强模型的性能。" 在计算机视觉领域，对象类分割是一项关键任务，它需要将图像的每个像素分配到其所属物体类别。深度卷积神经网络（DNN）因其能够学习并利用局部空间相关性而被广泛应用于这项任务。然而，DNN的固定大小滤波器限制了它们处理长距离依赖关系的能力。另一方面，循环神经网络（RNN）不受此限制，其迭代解释方式使它们能够通过传播活动来建模长距离依赖。这在处理视频序列时尤其有用，因为视频中同时存在空间和时间上的长距离依赖关系。论文中提出了一个新颖的RNN架构，用于对象类分割。作者研究了几种不同的方法，结合RNN和卷积神经网络的优势，以更好地理解和预测视频序列中的像素级标签。具体来说，该工作可能包括以下方面： 1. **RNN与CNN的融合**：将RNN的序列处理能力与CNN的特征提取能力相结合，构建一个可以捕捉到空间和时间连续性的模型，以实现更精确的视频帧分割。 2. **长期依赖的建模**：通过RNN的循环结构，模型能处理时间序列中的上下文信息，这在传统的卷积网络中是难以实现的。 3. **实验与评估**：可能包含对不同模型变体的实验对比，以及在标准RGB-D视频分割数据集上的性能评估，以验证新方法的有效性。 4. **应用场景**：讨论了这种技术在自动驾驶、机器人导航、视频监控等领域的潜在应用，其中精确的实时对象类分割至关重要。这篇论文贡献了一种新的深度学习模型，旨在克服深度卷积神经网络在处理RGB-D视频对象类分割时的局限性，通过结合循环神经网络的能力，提升了对视频序列中复杂时空模式的理解和分割精度。

combined with the image at time t, producing an output and a new state. Since

the last output beneﬁts from learning from the whole sequence, it is natural to

place the frame that we want to evaluate at the end.

The ﬁrst temporal copy is special, since it contains regular feed-forward

connections. This allows us to produce activations in each layer such that all

connection types can be used in the transition from t to t + 1.

Network Depth. When processing input at time t, we allow L −1 time steps for

the information to reach the top level of the network and the same amount for

propagating back to the bottom layer, where the output corresponding to time

t is evaluated. Note that the last temporal steps do not need all the hidden

layers, since their activation would no longer propagate to the output.

Our RNN is trained with backpropagation through time (BPTT), and can

be interpreted as a very deep non-recurrent net after unfolding in time. In

this non-recurrent network, multiple paths lead to the output, with the shortest

path — from input t = T to the ﬁnal output — having only length 2L − 1, and

the longest 2L + t, which amounts to a depth of 14 layers for our L = 3, T = 8

network.

Weight Initialization and Optimization. We initialize the weights and biases

from a Gaussian distribution. It is important to ensure that the activations

do not explode or vanish early during training. Ideally, activations in the ﬁrst

forward pass should have similar magnitudes. This is diﬃcult to control, how-

ever. Instead, we choose the standard deviation of the weights for each layer l

according to the scheme proposed by He et al. (2015):

σ =

· d

l−1

, (1)

which takes into account the ﬁlter size k

and the number of ﬁlters of the last

layer d

l−1

. We determine the mean of the bias such that the average of the

activations in every point of our network is positive and slightly decreasing over

time. Liang and Hu (2015) use local contrast normalization at all layers to the

same eﬀect, which requires more GPU memory for the hidden layer activations.

Due to our larger inputs and outputs and the increased number of time steps,

current GPU memory restrictions prevent us from doing the same.

We learn the parameters of our network with backpropagation through

time (BPTT) using RMSProp, which is a variant of resilient backpropagation

(RPROP, Riedmiller and Braun 1993) suitable for mini-batch learning (Dauphin

et al., 2015). RPROP and RMSProp to a large degree consider only the sign of

the gradient, thus being robust against vanishing and exploding gradients, both

common phenomena in RNN training.

During learning, we apply dropout (Srivastava et al., 2014). Combining

dropout with RNNs is delicate, however. If it aﬀects recurrent connections,

their ability to learn long-range dependencies suﬀers (Pham et al., 2014). Thus,

we apply dropout only to the ﬁnal convolution with non-shared weights that

剩余20页未读，继续阅读

Fun_He

粉丝: 19

使用循环卷积神经网络进行RGB-D视频对象类分割

hakai_segmentation-0.1.0rc2-py3-none-any.whl：Python库的解压安装指南

CVPR2018 Oral论文深度解析：人工智能与机器学习的前沿探索

PCL 1.11.0全依赖库整理：点云处理关键组件

this_is_a_simple_demo__for_image_segmentation.----_unet.zip

Keras-Semantic-Segmentation__Keras-Semantic-Segmentation.zip

Fusion-Aware_Point_Convolution_for_Online_Semantic_3D_Scene_Segmentation.pdf

Topological_derivative_Segmentation.zip_segmentation_segmentatio

Object_detection_and_instance_segmentation_toolkit_Paddle

lung-image-segmentation.rar_lung_lung matlab_lung segmentation_水

Object_detection_image_segmentation_pytorch:使用pytorch进行目标检测和图像分割

最新资源