深度循环卷积神经网络：场景标注的新方法

93 浏览量更新于2024-08-27 收藏 769KB PDF 举报

"这篇研究论文探讨了一种新型的卷积神经网络（CNN）结构，即在层内具有循环连接的卷积神经网络（Intra-layer Recurrent Convolutional Neural Networks，IRCNN），用于场景标签任务。传统的CNN通常依赖于局部判别特征，但可能无法有效地捕捉全局上下文信息。IRCNN通过引入层内的循环连接，在卷积层中形成了二维的循环神经网络，从而在处理场景标签时能逐步扩大每个单元捕获的上下文区域，增强对全局信息的利用。" 本文的主要贡献在于提出了一个深度循环卷积神经网络（RCNN）模型，该模型最初是为对象识别任务设计的，但在场景标签任务中表现出色。与标准CNN相比，IRCNN的独特之处在于其卷积层内部的循环连接。这种设计使得每一层的卷积单元不仅能接收到前一层的常量前向输入，还能接收到其邻域的循环输入。随着循环迭代的进行，每个单元能够捕获的上下文区域逐渐增大，这有助于模型理解和分析图像中的复杂结构。场景标签是一个计算机视觉领域的难题，它需要同时利用局部的判别特征和全局的上下文信息。IRCNN通过循环机制解决了这一问题，能够在处理图像时动态地考虑更广泛的环境信息，而不仅仅是局部的像素特征。这种扩展的上下文理解能力对于场景的理解至关重要，尤其是在复杂或模糊的图像中，能够提高分类的准确性和鲁棒性。在实现上，IRCNN可能采用了递归网络的训练策略，如门控循环单元（GRU）或长短期记忆（LSTM）单元，这些单元能够有效地管理和传播时间序列中的信息。通过在卷积层中应用这些循环结构，IRCNN能够以一种高效且灵活的方式处理复杂的图像数据，为场景标签任务提供了新的解决方案。这篇论文在CNN的基础上引入了创新性的循环连接，增强了模型对全局信息的处理能力，对场景标签的性能进行了显著提升，对深度学习和计算机视觉领域具有重要的理论和实践意义。未来的研究可以探索IRCNN与其他深度学习架构的融合，以及如何进一步优化循环连接的设计以提升模型效率和准确性。

Convolutional Neural Networks with Intra-layer

Recurrent Connections for Scene Labeling

Ming Liang Xiaolin Hu Bo Zhang

Tsinghua National Laboratory for Information Science and Technology (TNList)

Department of Computer Science and Technology

Center for Brain-Inspired Computing Research (CBICR)

Tsinghua University, Beijing 100084, China

liangm07@mails.tsinghua.edu.cn, {xlhu,dcszb}@tsinghua.edu.cn

Abstract

Scene labeling is a challenging computer vision task. It requires the use of both

local discriminative features and global context information. We adopt a deep

recurrent convolutional neural network (RCNN) for this task, which is originally

proposed for object recognition. Different from traditional convolutional neural

networks (CNN), this model has intra-layer recurrent connections in the convo-

lutional layers. Therefore each convolutional layer becomes a two-dimensional

recurrent neural network. The units receive constant feed-forward inputs from the

previous layer and recurrent inputs from their neighborhoods. While recurrent

iterations proceed, the region of context captured by each unit expands. In this

way, feature extraction and context modulation are seamlessly integrated, which

is different from typical methods that entail separate modules for the two steps.

To further utilize the context, a multi-scale RCNN is proposed. Over two bench-

mark datasets, Standford Background and Sift Flow, the model outperforms many

state-of-the-art models in accuracy and efﬁciency.

1 Introduction

Scene labeling (or scene parsing) is an important step towards high-level image interpretation. It

aims at fully parsing the input image by labeling the semantic category of each pixel. Compared

with image classiﬁcation, scene labeling is more challenging as it simultaneously solves both seg-

mentation and recognition. The typical approach for scene labeling consists of two steps. First,

extract local handcrafted features [6, 15, 26, 23, 27]. Second, integrate context information using

probabilistic graphical models [6, 5, 18] or other techniques [24, 21]. In recent years, motivated by

the success of deep neural networks in learning visual representations, CNN [12] is incorporated in-

to this framework for feature extraction. However, since CNN does not have an explicit mechanism

to modulate its features with context, to achieve better results, other methods such as conditional

random ﬁeld (CRF) [5] and recursive parsing tree [21] are still needed to integrate the context infor-

mation. It would be interesting to have a neural network capable of performing scene labeling in an

end-to-end manner.

A natural way to incorporate context modulation in neural networks is to introduce recurrent con-

nections. This has been extensively studied in sequence learning tasks such as online handwriting

recognition [8], speech recognition [9] and machine translation [25]. The sequential data has strong

correlations along the time axis. Recurrent neural networks (RNN) are suitable for these tasks be-

cause the long-range context information can be captured by a ﬁxed number of recurrent weights.

Treating scene labeling as a two-dimensional variant of sequence learning, RNN can also be applied,

but the studies are relatively scarce. Recently, a recurrent CNN (RCNN) in which the output of the

top layer of a CNN is integrated with the input in the bottom is successfully applied to scene labeling

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38655309

粉丝: 5
资源: 904

深度循环卷积神经网络：场景标注的新方法

intra-mart 很好的学习资料教程 intramart 日文版教程

Large capacity data hiding combined with secure-intra-watennarking based on H.264/AVC

intrawebkaifabiji.rar_intra- prediction_intra-mart_intraweb

Intra-Vehicular-Networks:SDN车载网络

Intra-Pulse Modulation Feature Analysis for Radar Signals

Improved Intra-coding Methods for H.264/AVC

Cryptanalysis and Improvement of Chaos-Based Image Encryption Scheme with Circular Inter-Intra-Pixels Bit-Level Permutation

vgr-62-intra-application-adapter-hook

Broadband spectrum generation with compact Yb-doped fiber laser by intra-cavity cascaded Raman scattering

intra-mart

最新资源