2.1 History of OMCSNet
Building large-scale databases of commonsense knowledge is not a trivial task. One
problem is scale. It has been estimated that the scope of common sense may involve
many tens of millions of pieces of knowledge (Mueller, 2001). Unfortunately, com-
mon sense cannot be easily mined from dictionaries, encyclopedias, the web, or
other corpora because it consists largely of knowledge obvious to a reader, and thus
omitted. Indeed, it likely takes much common sense to even interpret dictionaries
and encyclopedias. Until recently, it seemed that the only way to build a common-
sense knowledge base was through the expensive process of hiring an army of
knowledge engineers to hand-code each and every fact.
However, in recent years we have been exploring a new approach. Inspired by
the success of distributed and collaborative projects on the Web, Singh et al. turned
to volunteers from the general public to massively distribute the problem of building
a commonsense knowledgebase. Three years ago, the Open Mind Commonsense
(OMCS) web site (Singh et al. 2002) was built, a collection of 30 different activities,
each of which elicits a different type of commonsense knowledge—simple asser-
tions, descriptions of typical situations, stories describing ordinary activities and
actions, and so forth. Since then the website has gathered over 675,000 items of
commonsense knowledge from over 13,000 contributors from around the world,
many with no special training in computer science. The OMCS corpus now consists
of a tremendous range of different types of commonsense knowledge, expressed in
natural language.
The earliest applications of the OMCS corpus made use of its knowledge not di-
rectly, but by first extracting into semantic networks only the types of knowledge
they needed. For example, the ARIA photo retrieval system (Liu & Lieberman,
2002) extracted taxonomic, spatial, functional, causal, and emotional knowledge
from OMCS to improve information retrieval. This suggested a new approach to
building a commonsense knowledgebase. Rather than directly engineering the
knowledge structures used by the reasoning system, as is done in Cyc, OMCS en-
courages people to provide information clearly in natural language, and then from
this semi-structured English sentence corpus, we are able to extract more usable
knowledge representations and generate useable knowledge bases. In OMCSNet,
we reformulated the knowledge in OMCS into a system of binary relations which
constitute a semantic network. This allows us to apply graph-based methods when
reasoning about text.
2.2 Generating OMCSNet from the OMCS corpus
The current OMCSNet is produced by an automatic process, which applies a set of
‘commonsense extraction rules’ to the semi-structured English sentences of the
OMCS corpus. The key to being able to do this is that the OMCS website already
elicits knowledge in a semi-structured way by prompting users with fill-in-the-blank
templates (e.g. “The effect of [falling off a bike] is [you get hurt]”). A pattern
matching parser uses roughly 40 mapping rules to easily parse semi-structured sen-