MojiSem:
Varying linguistic purposes of emoji in (Twitter) context
Noa Na’aman, Hannah Provenza, Orion Montoya
Brandeis University
{nnaaman,hprovenza,obm}@brandeis.edu
Abstract
Early research into emoji in textual com-
munication has focused largely on high-
frequency usages and ambiguity of inter-
pretations. Investigation of a wide range of
emoji usage shows these glyphs serving at
least two very different purposes: as con-
tent and function words, or as multimodal
affective markers. Identifying where an
emoji is replacing textual content allows
NLP tools the possibility of parsing them
as any other word or phrase. Recognizing
the import of non-content emoji can be a
a significant part of understanding a mes-
sage as well.
We report on an annotation task on En-
glish Twitter data with the goal of classify-
ing emoji uses by these categories, and on
the effectiveness of a classifier trained on
these annotations. We find that it is pos-
sible to train a classifier to tell the differ-
ence between those emoji used as linguis-
tic content words and those used as par-
alinguistic or affective multimodal mark-
ers even with a small amount of training
data, but that accurate sub-classification
of these multimodal emoji into specific
classes like attitude, topic, or gesture will
require more data and more feature engi-
neering.
1 Background
Emoji characters were first offered on Japanese
mobile phones around the turn of the 21st cen-
tury. These pictographic elements reached global
language communities after being added to Uni-
code 6.0 in 2010, and then being offered within
software keyboards on smartphones. In the ensu-
ing half-decade, digitally-mediated language users
have evolved diverse and novel linguistic uses for
emoji.
The expressive richness of emoji communica-
tion would, on its own, be sufficient reason to
seek a nuanced understanding of its usage. But
our initial survey of emoji on Twitter reveals many
cases where emoji serve direct semantic functions
in a tweet or where they are used as a grammat-
ical function such as a preposition or punctua-
tion. Early work on Twitter emoticons (Schnoe-
belen, 2012) pre-dated the wide spread of Uni-
code emoji on mobile and desktop devices. Recent
work (Miller et al., 2016) has explored the cross-
platform ambiguity of emoji renderings; (Eis-
ner et al., 2016) created word embeddings that
performed competitively on emoji analogy tasks;
(Ljube
ˇ
sic and Fi
ˇ
ser, 2016) mapped global emoji
distributions by frequency; (Barbieri et al., 2017)
used LSTMs to predict them in context.
We feel that a lexical semantics of emoji char-
acters is implied in these studies without being di-
rectly addressed. Words are not used randomly,
and neither are emoji. But even when they replace
a word, emoji are used for different purposes than
words. We believe that work on emoji would be
better informed if there were an explicit typology
of the linguistic functions that emoji can serve in
expressive text. The current project offered anno-
tators a framework and heuristics to classify uses
of emoji by linguistic and discursive function. We
then used a model based on this corpus to pre-
dict the grammatical function of emoji characters
in novel contexts.
2 Annotation task
Although recognizing the presence of emoji char-
acters is trivial, the linguistic distinctions we
sought to annotate were ambiguous and seemed
prone to disagreement. Therefore in our annota-
tion guidelines we structured the process to mini-
mize cognitive load and lead the annotators to in-