海马体预测地图：强化学习视角下的空间记忆与适应策略

预测地图

需积分: 12 15 浏览量更新于2024-07-19 收藏 13.06MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

海马体，作为大脑中的关键结构，一直以来被认知为空间导航的主要中心，特别是基于位置细胞（place cells）的研究，这些细胞被认为是编码空间几何结构的关键元素。然而，近期DeepMind在《Nature Neuroscience》发表的一项研究表明，海马体的功能可能超越了纯粹的空间编码，更深层次的是涉及预测性编码（predictive coding）的概念。预测地图理论提出，海马体不仅记录当前的空间位置，而且能够预测未来状态，这是一种动态的、基于预期的框架，它能帮助大脑快速适应不断变化的环境，特别是奖励条件的变化。这种机制允许大脑不必进行详细的未来场景模拟，而是利用对潜在结果的预测来指导行为决策。这种预测性地图与强化学习中的策略紧密相连，强化学习是一种机器学习方法，通过不断尝试和错误，优化行为以最大化未来奖励。 DeepMind的研究者们指出，基于这种预测地图理论，传统的关于空间认知的理解需要更新。他们发现，海马体中的位置细胞以及其他如内嗅皮质网格细胞（entorhinal grid cells）可能编码低维度的预测性信息，这些信息包含了空间位置之外的其他重要因素，如时间、方向以及潜在的奖励分布。这些细胞的响应模式不仅仅是空间导向，它们更像是构建了一种动态的“预测图”，帮助大脑高效地处理复杂的环境和行为决策。结合强化学习中的模型基础算法（model-based algorithms）和无模型算法（model-free algorithms），DeepMind团队设计了一种新的神经网络架构，旨在模仿这种预测地图功能。这种融合可以实现计算效率的提升，同时保持对奖励变化的快速适应性，这对于开发人工智能系统具有重要意义，特别是在需要处理不确定性和动态环境的领域，如自动驾驶、游戏策略或机器人导航等。 DeepMind的研究揭示了海马体预测地图的新视角，它不仅局限于空间定位，而是涉及到更高级的认知过程，包括预测、奖励敏感性和策略依赖性。这为我们理解大脑如何规划和决策提供了全新的洞见，并可能推动人工智能技术的发展，特别是在基于学习的智能系统的设计中。

资源详情

资源推荐

each eigenvector go to zero and positive elements go to one. Edges that connect between these two labeled

groups of states are “cut” by the partition, and nodes adjacent to these edges are a kind of bottleneck

subgoal. The ﬁrst subgoals that emerge will be the cut from the lowest-frequency eigenvector, and these

subgoals will approximately lie between the two largest, most separable clusters in the partition (see

Supplemental Methods for more detail). A prioritized sequence of subgoals is obtained by incorporating

increasingly higher frequency eigenvectors that produce partition points nearer to the agent.

The SR shares its eigenvectors with the graph Laplacian (see Supplemental Methods)

, making SR

eigenvectors equally suitable for this process of subgoal discovery. We show in Fig. S4 that the subgoals

that emerge in a 2-step decision task and in a multicompartment environment tend to fall near doorways and

decision points: natural subgoals for high-level planning. It is worth noting that SR matrices parameterized

by larger discount factors

will project predominantly on the large-spatial-scale grid components. The

relationship between more temporally diffuse, abstract SRs, in which states in the same room are all

encoded similarly (Fig. S2), and the subgoals that join those clusters can therefore be captured by which

eigenvalues are large enough to consider.

The fact that large SR ﬁelds project predominantly onto eigenvectors with large spatial scales, whereas

smaller SR ﬁelds project more strongly onto ﬁner scale grid ﬁelds, is consistent with the smooth longitu-

dinal gradient in connectivity between MEC and hippocampus

. Hippocampal cells with larger place

ﬁelds are more densely wired to the entorhinal cells with larger spatial scales in their grids, and vice versa.

It has also been shown experimentally that entorhinal lesions impair performance on navigation tasks

and disrupt the temporal ordering of sequential activations in hippocampus while leaving performance on

location recognition tasks intact

45, 51

. This suggests a role of grid cells in spatial planning, and encourages

us to speculate about a more general role for grid cells in hierarchical planning.

Discussion

The hippocampus has long been thought to encode a cognitive map, but the precise nature of this map

is elusive. The traditional view that the map is essentially spatial

7, 8

is not sufﬁcient to explain some of

the most striking aspects of hippocampal representation, such as the dependence of place ﬁelds on an

animal’s behavioral policy and the environment’s topology. We argue instead that the map is essentially

predictive, encoding expectations about an animal’s future state. This view resonates with earlier ideas

about the predictive function of the hippocampus

20, 52–54

. Our main contribution is a formalization of

this predictive function in a reinforcement learning framework, offering a new perspective on how the

hippocampus supports adaptive behavior.

Our theory is connected to earlier work by Gustafson and Daw

showing how topologically-sensitive

spatial representations recapitulate many aspects of place cells and grid cells that are difﬁcult to rec-

oncile with a purely Euclidean representation of space. They also showed how encoding topological

structure greatly aids reinforcement learning in complex spatial environments. Earlier work by Foster

and colleagues

also used place cells as features for RL, although the spatial representation did not

explicitly encode topological structure. While these theoretical precedents highlight the importance of

spatial representation, they leave open the deeper question of why particular representations are better than

others. We showed that the SR naturally encodes topological structure in a format that enables efﬁcient

RL.

The work is also related to work done by Dordek et al.

, who demonstrated that gridlike activity

patterns from principal components of the population activity of simulated Gaussian place cells. As we

mentioned in the Results, one point of departure between empirically observed grid cells data and SR

eigenvector account is that in rectangular environments, SR eigenvector grid ﬁelds can have different

7/24

.CC-BY-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under a

The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/097170doi: bioRxiv preprint first posted online Dec. 28, 2016;

spatial scales aligned to the horizontal and vertical axis (see Fig. S8)

. In grid cells, the spatial scales tend

to be approximately constant in all directions unless the environment changes

. The principal components

of Gaussian place ﬁeld activity are mathematically related to the SR eigenvectors, and naturally also have

grid ﬁelds that scale independently along the perpendicular boundaries of a rectangular room. However,

Dordek et al. found that when the components were constrained to have non-negative values and the

constraint that components be orthogonal was relaxed, the scaling became uniform in all directions and

the lattices became more hexagonal

. This suggests that the difference between SR eigenvectors and

recorded grid cells is not fundamental to the idea that grid cells are applying a spectral dimensionality

reduction. Rather, additional constraints such as non-negativity are required.

The SR can be viewed as occupying a middle ground between model-free and model-based learning.

Model-free learning requires storing a look-up table of cached values estimated from the reward history

1, 56

Should the reward structure of the environment change, the entire look-up table must be re-estimated.

By decomposing the value function into a predictive representation and a reward representation, the SR

allows an agent to ﬂexibly recompute values when rewards change, without sacriﬁcing the computational

efﬁciency of model-free methods

. Model-based learning is robust to changes in the reward structure, but

requires inefﬁcient algorithms like tree search to compute values

1, 15

Certain behaviors often attributed to a model-based system can be explained by a model in which

predictions based on state dynamics and the reward function are learned separately. For instance, the

context preexposure facilitation effect refers to the ﬁnding that contextual fear conditioning is acquired

more rapidly if the animal has the chance to explore the environment for several minutes before the ﬁrst

shock

. The facilitation effect is classically believed to arise from the development of a conjunctive

representation of the context in the hippocampus, though areas outside the hippocampus may also develop

a conjunctive representation in the absence of the hippocampus, albeit less efﬁciently

. The SR provides a

somewhat different interpretation: over the course of preexposure, the hippocampus develops a predictive

representation of the context, such that subsequent learning is rapidly propagated across space. Figure S5

shows a simulation of this process and how it accounts for the facilitation effect.

Recent work has elucidated connections between models of episodic memory and the SR. Speciﬁcally,

Gershman et al. demonstrated that the SR is closely related to the Temporal Context Model (TCM) of

episodic memory

16, 19

. The core idea of TCM is that items are bound to their temporal context (a running

average of recently experienced items), and the currently active temporal context is used to cue retrieval of

other items, which in turn cause their temporal context to be retrieved. The SR can be seen as encoding a

set of item-context associations. The connection to episodic memory is especially interesting given the

crucial mnemonic role played by the hippocampus and entorhinal cortex in episodic memory. Howard

and colleagues

have laid out a detailed mapping between TCM and the medial temporal lobe (including

entorhinal and hippocampal regions).

Spectral graph theory provides insight into the topological structure encoded by the SR. We showed

speciﬁcally that eigenvectors of the SR can be used to discover a hierarchical decomposition of the

environment for use in hierarchical RL. Mahadevan et al. demonstrated that the related Laplacian

eigenvectors are useful as a representational basis for approximating value functions, dubbing these

eigenvectors “protovalue functions”

. Spectral analysis has frequently been invoked as a computational

motivation for entorhinal grid cells (e.g., by Krupic and colleagues

). The fact that any function can

be reconstructed by sums of sinusoids suggests that the entorhinal cortex implements a kind of Fourier

transform of space. However, Fourier analysis is not the right mathematical tool when dealing with spatial

representations in a topologically structured environment, since we do not expect functions to be smooth

over boundaries in the environment. This is precisely the purpose of spectral graph theory: Instead of being

maximally smooth over Euclidean space, the eigenvectors of the graph Laplacian embed the smoothest

8/24

.CC-BY-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under a

The copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/097170doi: bioRxiv preprint first posted online Dec. 28, 2016;

剩余39页未读，继续阅读

tz_zs

粉丝: 383
资源: 24

海马体预测地图：强化学习视角下的空间记忆与适应策略

deep-learning-map:深度学习地图和论文笔记

3_海马分割_海马体分割_matlab磁共振_matlab_医学图像_

海马体、深度睡眠、浅度睡眠和转眼运动的关系

基于海马mri的阿尔茨海默病痴呆早期预测的深度学习模型

海马优化算法优化神经网络

海马亚区freesurfer分割教程

freesurfe 海马分割

通过协方差分析（ANCOVA）比较3个组之间的海马亚区体积，控制年龄、教育、eTIV（估计总颅内容积）和身体质量指数（BMI）作为协变量的具体例子和式子

海马s5车机怎么设置自动屏保功能

以海马硬化型颞叶癫痫手术预后相关的fMRI图论分析研究为题写一篇摘要，要包括目的、材料和方法、结果、结论四个部分

人类有哪些基本注意和记忆？跟注意和记忆相关的大脑结构有哪些？ 简述脑与认知科学对我们生活中如何通过学习作出决策所带来的话启示。

如何客观的测量情绪？不同脑结构与情绪的关系是怎样的？

海马号遥控水下机器人中推进电机的类型和参数

MCI患者的脑神经变化

写一篇关于脑部的文献综述

江苏盐城滨海盐沼穴居生物有哪些

阿尔兹海默 mri图像数据

freesurfer 模板

医学图像分割任务想用到文本信息作为监督文本信息应该设计成什么形式

最新资源

人类有哪些基本注意和记忆？跟注意和记忆相关的大脑结构有哪些？简述脑与认知科学对我们生活中如何通过学习作出决策所带来的话启示。