perceptual data, while the symbolic front end must learn a mapping from the resulting symbolic
representation to actions that maximise expected reward over time.
In this paper we present one instantiation of this architecture as a proof-of-concept, and illustrate
its effectiveness on several variants of a simple video game. This demonstrator system has many
limitations and makes numerous simplifying assumptions that are not inherent in the larger proposal,
but it illustrates the four fundamental principles of our architectural manifesto. (For a related set of
desiderata see [25].)
1) Conceptual abstraction. Determining that a new situation is similar or analogous to one (or
several) encountered previously is an operation fundamental to general intelligence, and to rein-
forcement learning in particular. In a conventional DRL system, such as DQN [4], this is achieved
through the generalising capabilities of the neural network that approximates the Q function (or the
value function or policy function, depending on the style of reinforcement learning in question).
However, this low-level approach to establishing similarity relationships requires the gradual build-
up of a statistical picture of the state space. The upshot is that while a novice human player will
rapidly spot the high-level similarity between, say, the paddle and ball in Pong and the paddle and
ball in Breakout, a conventional DRL system is blind to this. By contrast, the present architecture
maps high-dimensional raw input into a lower-dimensional conceptual state space within which it is
possible to establish similarity between states using symbolic methods that operate at a higher level
of abstraction. This facilitates both data efficient learning and transfer learning as well as providing
a foundation for other high-level cognitive processes such as planning, innovative problem solving,
and communication with other agents (including humans).
2) Compositional structure. To enable this sort of conceptual abstraction, a representational
medium is required that has a compositional structure. That is to say it should comprise a set of
elements that can be combined and recombined in an open-ended way. Classically, the theoreti-
cal foundation for such a representational medium is first-order logic, and the underlying language
comprises predicates, quantifiers, constant symbols, function symbols, and boolean operators [17].
(It should be noted that a fixed-size vector representation is inadequate for such a representational
medium, because it can encode formulae of arbitrary length.) But the binary nature of classical
logic makes it less well suited to dealing with the uncertainty inherent in real data than a Bayesian
approach. To handle uncertainty, we propose probabilistic first-order logic for the semantic under-
pinnings of the low-dimensional conceptual state space representation into which the neural front
end must map the system’s high-dimensional raw input [26].
3) Common sense priors. Although our target is general intelligence, meaning the ability to achieve
goals and perform tasks in a wide variety of domains, it is unrealistic to expect an end-to-end rein-
forcement learning system to succeed with no prior assumptions about the domain. For example, in
most DRL systems that take visual input, spatial priors, such as the likelihood that similar 2D pat-
terns will appear in different locations in the visual field, are implicit in the convolutional structure
of the network [27]. But the everyday physical world is structured according to many other common
sense priors [28, 27, 29, 25, 30]. Consisting mostly of empty space, it contains a variety of objects
that tend to persist over time and have various attributes such as shape, colour, and texture [31].
Objects frequently move, typically in continuous trajectories. Objects participate in a number of
stereotypical events, such as starting to move or coming to a halt, appearing or disappearing, and
coming into contact with other objects. These minimal assumptions and expectations can be built
into the system by grafting a suitable ontology onto the underlying representational language, greatly
reducing the learning workload and facilitating various forms of common sense reasoning.
4) Causal reasoning. The current generation of DRL architectures eschews model-based reinforce-
ment learning, ensuring that the resulting systems are purely reactive. By contrast, the architecture
we propose attempts to discover the causal structure of the domain, and to encode this as a set of
symbolic causal rules expressed in terms of the common sense ontology described above. These
causal rules enable conceptual abstraction. As already mentioned, the key to general intelligence
is the ability to see that an ongoing situation is similar or analogous to a previously encountered
situation or set of situations. A deep neural network that approximates the Q function in reinforce-
ment learning can be thought of as carrying out analogical inference of this kind, but only at the
most superficial, statistical level. To carry out analogical inference at a more abstract level, and
thereby facilitate the transfer of expertise from one domain to another, the narrative structure of the
ongoing situation needs to be mapped to the causal structure of a set of previously encountered situa-
3