arXiv:1901.08548v1 [cs.LG] 20 Jan 2019
A tensorized logic programming language for large-scale data
Ryosuke Kojima
1
, Taisuke Sato
2
1
Departmen t of Biomedical Data Intelligence, Graduate School of Medicine,
Kyoto University, Kyoto, Japan.
2
AI research center ( AIRC)
National Institute of Advanced Industrial Science a nd Technology (AIST), Tokyo, Japan.
Abstract
We introduce a new logic programming language T-PRISM
based on tensor embeddings. Our embedding scheme is
a modification of the distri bution semantics in PRISM,
one of the state-of-the-art probabilistic logic programming
languages, by replacing distribution functions with multi-
dimensional arrays, i.e., tensors. T-PRISM consists of two
parts: logic programming part and numerical computation
part. The former provides flexible and interpretable model-
ing at the level of first order logic, and the latter part provides
scalable computation utilizing parallelization and hardware
acceleration with GPUs. Combing these two parts provides
a remarkably wide range of high-level declarative modeling
from symbolic reasoning to deep learning.
To embody this programming language, we also introduce a
new semantics, termed tensorized semantics, which combines
the t raditional least model semantics in logic programming
with the embeddings of tensors. In T-PRIS M, we first derive
a set of equations related to tensors from a given program
using logical inference, i.e., Prolog execution in a symbolic
space and then solve the derived equations in a continuous
space by TensorFlow.
Using our preliminary implementation of T-P RISM, we have
successfully dealt with a wide range of modeling. We have
succeeded in dealing with real large-scale data in the declar-
ative modeling. This paper presents a DistMult model for
knowledge graphs using the FB15k and WN18 datasets.
1. Introduction
Logic programming provides concise expressions of
knowledge and has be en proposed as means for rep-
resenting and modeling various type s of data for real-
world AI systems. For example, to deal with unce r-
tain and noisy data, probabilistic logic programming
(PLP) has been extensively studied (Kimmig et al. 2011;
Wang, Mazaitis, and Cohen 2013; Sato and Kameya 2008).
PLP systems allow users to flexibly and clearly describe
stochastic dependencies and relations between entities using
logic program ming. Also in the different context, to han-
dle a wide range of applications, the unification of neu-
ral networks and approxima te reasoning by symbol em-
beddings into contin uous spaces has recently been pro-
Copyright
c
2019, Association for the A dvancement of Art ificial
Intelligence (www.aaai.org). All rights reserved.
posed (Manhaeve et al. 2018; Rockt¨aschel and Riedel 2017;
Evans a nd Grefenstette 2018).
In this paper, we tac kle a task of combining symbolic
reasoning and multi-dimensional continuous-space embed-
dings of lo gical constructs such as clauses, and explore a
new approach to compile a program written in a declarative
languag e into a procedure of numerical calculation suitable
for large-scale data. Such languages are expected to be in-
terpretable as a programming language while efficiently exe-
cutable in the level of numerical calculation like vector com-
putation. Aiming at this goal, we introduce tensorized se-
mantics, a novel algebraic semantics interfacing a symbolic
reasoning layer and the numeric computation layer, and pro-
pose a new modeling language “T-PRISM”. It is based on an
existing probabilistic logic programming language PRISM
(Sato and Kameya 2008) and implements the tensorized se-
mantics for large-scale datasets.
Thus the first co ntribution of this paper is the introduc-
tion of a new semantics, tensorized semantics. T he current
PRISM has the distribution semantics (Sato 1995) that prob-
abilistically generalizes the least model sem a ntics in logic
programming to coherently assign pr obabilities to the log-
ical con structs. Likewise tensorized semantics assigns ten-
sors
1
to the logical constructs based on the le a st mode l se-
mantics. Both of PRISM and T-PRISM programs are char-
acterized by a set of equations for the assigned quantities.
One may be ab le to view T-PRISM as one approach
to an equation-level interface to connect logic program-
ming and continuous-spa ce embeddings. Another approach
is possible, predicate-level interface, such as DeepProblog
(Manhaeve et al. 2018). Their approach is to provide spe-
cial p redicates connecting neural networks and pr obabilistic
logic programming b y ProbL og. It implementation a lly sep-
arates neural networks from probabilistic models but synta c -
tically integrates, for example, image recognition and logi-
cal reasoning using estimated labels. This approach, unlike
our approach, does not allow constants and predicates to
have corresponding vectors (te nsors) representations in neu-
ral networks.
The second contribution of this paper is an implementa-
tion methodology of the T-PRISM’s tensorized semantics.
1
In this paper, the term “tensor” is used as a multi-dimensional
array interchangeably.