深度符号强化学习：结合深度学习和符号推理的新方法

142 浏览量更新于2024-08-25 收藏 970KB PDF 举报

"Towards Deep Symbolic Reinforcement Learning" 深度符号强化学习（Deep Symbolic Reinforcement Learning）是指将深度神经网络与符号学习相结合，以实现更好的强化学习效果。该领域的研究旨在克服当前深度强化学习（Deep Reinforcement Learning）技术的缺陷，例如需要大量数据集、学习速度慢、缺乏抽象推理能力等。深度强化学习（DRL）通过将深度神经网络应用于试验与错误学习的通用任务中，展示了其强大的学习能力，并在诸如Atari视频游戏和围棋游戏等任务中取得了不俗的成绩。然而，当前的深度强化学习系统继承了当前深度学习技术的一些缺陷。例如，它们需要非常大的数据集来发挥作用，导致学习速度慢，即使有了足够的数据集也不能快速学习。此外，它们也缺乏抽象推理能力，无法实现高层次的认知功能，如迁移学习、类比推理和基于假设的推理等。符号学习（Symbolic Learning）是指基于符号表示和推理的机器学习方法。符号学习可以提供抽象推理能力，实现高层次的认知功能。然而，符号学习也存在一些缺陷，例如难以处理大规模数据、缺乏泛化能力等。深度符号强化学习的目标是将深度神经网络与符号学习相结合，克服当前深度强化学习技术的缺陷，实现更好的强化学习效果。该领域的研究旨在设计新的学习算法和架构，能够同时具备深度学习的泛化能力和符号学习的抽象推理能力。在该论文中，作者们提出了深度符号强化学习的概念，并讨论了该领域的研究挑战和机遇。他们还提出了一个新的学习架构，能够将深度神经网络与符号学习相结合，实现更好的强化学习效果。深度符号强化学习是指将深度神经网络与符号学习相结合，以实现更好的强化学习效果。该领域的研究旨在克服当前深度强化学习技术的缺陷，实现更好的学习效果。在该领域的研究中，存在一些关键技术挑战，例如： 1. 如何将深度神经网络与符号学习相结合，实现更好的强化学习效果？ 2. 如何克服当前深度强化学习技术的缺陷，例如需要大量数据集、学习速度慢等？ 3. 如何设计新的学习算法和架构，能够同时具备深度学习的泛化能力和符号学习的抽象推理能力？解决这些挑战的关键在于设计新的学习算法和架构，能够将深度神经网络与符号学习相结合，实现更好的强化学习效果。同时，需要开发新的技术和方法来克服当前深度强化学习技术的缺陷。深度符号强化学习是一个具有挑战性的领域，但同时也具有很高的研究价值和应用前景。未来，深度符号强化学习的研究将会推动人工智能和机器学习技术的发展，实现更好的学习效果和应用效果。

perceptual data, while the symbolic front end must learn a mapping from the resulting symbolic

representation to actions that maximise expected reward over time.

In this paper we present one instantiation of this architecture as a proof-of-concept, and illustrate

its effectiveness on several variants of a simple video game. This demonstrator system has many

limitations and makes numerous simplifying assumptions that are not inherent in the larger proposal,

but it illustrates the four fundamental principles of our architectural manifesto. (For a related set of

desiderata see [25].)

1) Conceptual abstraction. Determining that a new situation is similar or analogous to one (or

several) encountered previously is an operation fundamental to general intelligence, and to rein-

forcement learning in particular. In a conventional DRL system, such as DQN [4], this is achieved

through the generalising capabilities of the neural network that approximates the Q function (or the

value function or policy function, depending on the style of reinforcement learning in question).

However, this low-level approach to establishing similarity relationships requires the gradual build-

up of a statistical picture of the state space. The upshot is that while a novice human player will

rapidly spot the high-level similarity between, say, the paddle and ball in Pong and the paddle and

ball in Breakout, a conventional DRL system is blind to this. By contrast, the present architecture

maps high-dimensional raw input into a lower-dimensional conceptual state space within which it is

possible to establish similarity between states using symbolic methods that operate at a higher level

of abstraction. This facilitates both data efﬁcient learning and transfer learning as well as providing

a foundation for other high-level cognitive processes such as planning, innovative problem solving,

and communication with other agents (including humans).

2) Compositional structure. To enable this sort of conceptual abstraction, a representational

medium is required that has a compositional structure. That is to say it should comprise a set of

elements that can be combined and recombined in an open-ended way. Classically, the theoreti-

cal foundation for such a representational medium is ﬁrst-order logic, and the underlying language

comprises predicates, quantiﬁers, constant symbols, function symbols, and boolean operators [17].

(It should be noted that a ﬁxed-size vector representation is inadequate for such a representational

medium, because it can encode formulae of arbitrary length.) But the binary nature of classical

logic makes it less well suited to dealing with the uncertainty inherent in real data than a Bayesian

approach. To handle uncertainty, we propose probabilistic ﬁrst-order logic for the semantic under-

pinnings of the low-dimensional conceptual state space representation into which the neural front

end must map the system’s high-dimensional raw input [26].

3) Common sense priors. Although our target is general intelligence, meaning the ability to achieve

goals and perform tasks in a wide variety of domains, it is unrealistic to expect an end-to-end rein-

forcement learning system to succeed with no prior assumptions about the domain. For example, in

most DRL systems that take visual input, spatial priors, such as the likelihood that similar 2D pat-

terns will appear in different locations in the visual ﬁeld, are implicit in the convolutional structure

of the network [27]. But the everyday physical world is structured according to many other common

sense priors [28, 27, 29, 25, 30]. Consisting mostly of empty space, it contains a variety of objects

that tend to persist over time and have various attributes such as shape, colour, and texture [31].

Objects frequently move, typically in continuous trajectories. Objects participate in a number of

stereotypical events, such as starting to move or coming to a halt, appearing or disappearing, and

coming into contact with other objects. These minimal assumptions and expectations can be built

into the system by grafting a suitable ontology onto the underlying representational language, greatly

reducing the learning workload and facilitating various forms of common sense reasoning.

4) Causal reasoning. The current generation of DRL architectures eschews model-based reinforce-

ment learning, ensuring that the resulting systems are purely reactive. By contrast, the architecture

we propose attempts to discover the causal structure of the domain, and to encode this as a set of

symbolic causal rules expressed in terms of the common sense ontology described above. These

causal rules enable conceptual abstraction. As already mentioned, the key to general intelligence

is the ability to see that an ongoing situation is similar or analogous to a previously encountered

situation or set of situations. A deep neural network that approximates the Q function in reinforce-

ment learning can be thought of as carrying out analogical inference of this kind, but only at the

most superﬁcial, statistical level. To carry out analogical inference at a more abstract level, and

thereby facilitate the transfer of expertise from one domain to another, the narrative structure of the

ongoing situation needs to be mapped to the causal structure of a set of previously encountered situa-

剩余12页未读，继续阅读

weixin_38612527

粉丝: 5
资源: 954

深度符号强化学习：结合深度学习和符号推理的新方法

Towards Playing Full MOBA Games withDeep Reinforcement Learning.pdf

DeepMind 关系型深度强化学习 Relational Deep Reinforcement Learning

信息安全_数据安全_eu-16-Sullivan-Towards-A-Policy-.pdf

Towards Knowledge-Grounded Open-Domain Conversations.pdf

Ford-Trends-Book-2016-Interactive.pdf

HotSDN-paper-2014-ONOS-Towards-an-Open-Distributed-SDN-OS.pdf

Towards-a-Side-Access-Free-Data-Grid-Resource-by._Free!

Towards Perspective-Free Object Counting with Deep Learning_2016.pdf

deep-high-resolution-net.pytorch：该项目是CVPR2019论文“用于人体姿势估计的深度高分辨率表示学习”的正式实施。

Test-.Driven.Python.Development.1783987928

最新资源