Philosophers are Mortal: Inferring the Truth of Unseen Facts
Gabor Angeli
Stanford University
Stanford, CA 94305
angeli@stanford.edu
Christopher D. Manning
Stanford University
Stanford, CA 94305
manning@stanford.edu
Abstract
Large databases of facts are prevalent in
many applications. Such databases are
accurate, but as they broaden their scope
they become increasingly incomplete. In
contrast to extending such a database, we
present a system to query whether it con-
tains an arbitrary fact. This work can be
thought of as re-casting open domain in-
formation extraction: rather than growing
a database of known facts, we smooth this
data into a database in which any possi-
ble fact has membership with some confi-
dence. We evaluate our system predicting
held out facts, achieving 74.2% accuracy
and outperforming multiple baselines. We
also evaluate the system as a common-
sense filter for the ReVerb Open IE sys-
tem, and as a method for answer validation
in a Question Answering task.
1 Introduction
Databases of facts, such as Freebase (Bollacker
et al., 2008) or Open Information Extraction
(Open IE) extractions, are useful for a range of
NLP applications from semantic parsing to infor-
mation extraction. However, as the domain of a
database grows, it becomes increasingly impracti-
cal to collect completely, and increasingly unlikely
that all the elements intended for the database are
explicitly mentioned in the source corpus. In par-
ticular, common-sense facts are rarely explicitly
mentioned, despite their abundance. It would be
useful to infer the truth of such unseen facts rather
than assuming them to be implicitly false.
A growing body of work has focused on auto-
matically extending large databases with a finite
set of additional facts. In contrast, we propose
a system to generate the (possibly infinite) com-
pletion of such a database, with a degree of con-
fidence for each unseen fact. This task can be
cast as querying whether an arbitrary element is
a member of the database, with an informative de-
gree of confidence. Since often the facts in these
databases are devoid of context, we refine our no-
tion of truth to reflect whether we would assume
a fact to be true without evidence to the contrary.
In this vein, we can further refine our task as de-
termining whether an arbitrary fact is plausible –
true in the absence contradictory evidence.
In addition to general applications of such large
databases, our approach can further be integrated
into systems which can make use of probabilis-
tic membership. For example, certain machine
translation errors could be fixed by determining
that the target translation expresses an implausible
fact. Similarly, the system can be used as a soft
feature for semantic compatibility in coreference;
e.g., the types of phenomena expressed in Hobbs’
selectional constraints (Hobbs, 1978). Lastly, it is
useful as a common-sense filter; we evaluate the
system in this role by filtering implausible facts
from Open IE extractions, and filtering incorrect
responses for a question answering system.
Our approach generalizes word similarity met-
rics to a notion of fact similarity, and judges the
membership of an unseen fact based on the aggre-
gate similarity between it and existing members
of the database. For instance, if we have not seen
the fact that philosophers are mortal
1
but we know
that Greeks are mortal, and that philosophers and
Greeks are similar, we would like to infer that the
fact is nonetheless plausible.
We implement our approach on both a large
open-domain database of facts extracted from the
Open IE system ReVerb (Fader et al., 2011), and
ConceptNet (Liu and Singh, 2004), a hand curated
database of common sense facts.
1
This is an unseen fact in http://openie.cs.
washington.edu.