递归神经网络模型：高效处理大图像的新方法

需积分: 14 39 浏览量更新于2024-09-10 收藏 690KB PDF 举报

"递归模型神经网络用于高效处理大型图像，通过自适应选择高分辨率处理的区域，降低计算复杂度，同时保持一定的平移不变性。模型基于强化学习进行训练，适应不同任务需求，在图像分类和动态视觉控制任务中表现出色。" 在计算机视觉领域，卷积神经网络（CNN）在图像识别和处理方面取得了显著的成就，但其计算成本随着图像尺寸的增加而线性增长，这限制了其在处理大规模图像时的效率。为了解决这个问题，研究者们提出了递归模型神经网络（Recurrent Models of Visual Attention），这种新型模型旨在高效地从图像或视频中提取信息。递归神经网络模型的核心在于其自适应选择处理的区域或位置序列。它不再像传统的CNN那样遍历所有像素，而是根据任务需求有选择性地关注高分辨率区域，从而大大减少了计算量。尽管如此，该模型依然保留了一定程度的内部平移不变性，这是CNN的重要特性之一，允许模型对图像的局部变化保持不变性。然而，由于模型的非微分性质，不能直接使用传统的反向传播算法进行优化。相反，研究人员利用强化学习的方法来训练这个模型，使其能学习到特定于任务的策略。这种策略学习能力使得模型能够在没有明确训练信号的情况下，针对不同的任务自我调整，如图像分类和动态视觉控制问题。在多个图像分类任务中，递归模型神经网络与卷积神经网络基准相比，特别是在处理复杂背景的图像时，表现出明显的性能提升。这表明，通过自适应地关注关键区域，模型能更好地过滤噪声，提高分类准确性。此外，该模型在动态视觉控制问题上的应用展示了其强大的潜力。在没有明确的跟踪信号下，模型能够学习并实现对简单对象的跟踪。这证明了递归模型神经网络在解决需要实时处理和决策的问题时，具有很高的灵活性和适应性。递归模型神经网络通过引入注意力机制，成功地降低了大规模图像处理的计算复杂度，同时保持了良好的性能。这一创新模型有望在未来的计算机视觉任务中发挥更大的作用，尤其是在资源有限的环境下，如嵌入式系统和移动设备。

t-1

Glimpse

Sensor

ρ(x

, l

t-1

)

Glimpse Network : f

( θ

)

t-1

t+1

(θ

)

t-1

(θ

)

(θ

)

(θ

)

(θ

)

(θ

)

ρ(x

, l

t-1

)

t-1

Glimpse Sensor

Figure 1: A) Glimpse Sensor: Given the coordinates of the glimpse and an input image, the sen-

sor extracts a retina-like representation ρ(x

, l

t−1

) centered at l

t−1

that contains multiple resolution

patches. B) Glimpse Network: Given the location (l

t−1

) and input image (x

), uses the glimpse

sensor to extract retina representation ρ(x

, l

t−1

). The retina representation and glimpse location is

then mapped into a hidden space using independent linear layers parameterized by θ

and θ

respec-

tively using rectiﬁed units followed by another linear layer θ

to combine the information from both

components. The glimpse network f

(.; {θ

, θ

}) deﬁnes a trainable bandwidth limited sensor

for the attention network producing the glimpse representation g

. C) Model Architecture: Overall,

the model is an RNN. The core network of the model f

(.; θ

) takes the glimpse representation g

input and combining with the internal representation at previous time step h

t−1

, produces the new

internal state of the model h

. The location network f

(.; θ

) and the action network f

(.; θ

) use the

internal state h

of the model to produce the next location to attend to l

and the action/classiﬁcation

respectively. This basic RNN iteration is repeated for a variable number of steps.

information only in a local region or in a narrow frequency band. The agent can, however, actively

control how to deploy its sensor resources (e.g. choose the sensor location). The agent can also

affect the true state of the environment by executing actions. Since the environment is only partially

observed the agent needs to integrate information over time in order to determine how to act and

how to deploy its sensor most effectively. At each step, the agent receives a scalar reward (which

depends on the actions the agent has executed and can be delayed), and the goal of the agent is to

maximize the total sum of such rewards.

This formulation encompasses tasks as diverse as object detection in static images and control prob-

lems like playing a computer game from the image stream visible on the screen. For a game, the

environment state would be the true state of the game engine and the agent’s sensor would operate

on the video frame shown on the screen. (Note that for most games, a single frame would not fully

specify the game state). The environment actions here would correspond to joystick controls, and

the reward would reﬂect points scored. For object detection in static images the state of the envi-

ronment would be ﬁxed and correspond to the true contents of the image. The environmental action

would correspond to the classiﬁcation decision (which may be executed only after a ﬁxed number

of ﬁxations), and the reward would reﬂect if the decision is correct.

3.1 Model

The agent is built around a recurrent neural network as shown in Fig. 1. At each time step, it

processes the sensor data, integrates information over time, and chooses how to act and how to

deploy its sensor at next time step:

Sensor: At each step t the agent receives a (partial) observation of the environment in the form of

an image x

. The agent does not have full access to this image but rather can extract information

from x

via its bandwidth limited sensor ρ, e.g. by focusing the sensor on some region or frequency

band of interest.

In this paper we assume that the bandwidth-limited sensor extracts a retina-like representation

ρ(x

, l

t−1

) around location l

t−1

from image x

. It encodes the region around l at a high-resolution

but uses a progressively lower resolution for pixels further from l, resulting in a vector of much

剩余11页未读，继续阅读

qq_41223974

粉丝: 0
资源: 1

递归神经网络模型：高效处理大图像的新方法

递归神经网络模型.zip

数学模型中的递归算法

求解一个线性规划的递归神经网络模型

递归神经网络python

递归模糊神经网络 frnn csdn

递归神经网络和循环神经网络

循环神经网络和递归神经网络

递归神经网络对比lstm

递归神经网络使用场景

递归神经网络和零化神经网络有什么区别？

最新资源