mental Algorithm (Dale & Reiter, 1995), which would predict that color
should be preferred over size in our example.
While these heuristics focus exclusively on the requirement that a referent
be unambiguously id entified, research on reference in dialogue (e.g., Jordan
& Walker, 2005) has shown that under certain conditions, referring expressions
may also include ‘redundant’ properties in order to achieve other communicative
goals, such as confir mat i on of a prev i ou s utterance by an interlocutor. Similarl y,
White et al. (2010) present a system which generates user-tailored descriptions
in spoken dialogue, arguing that, for example, a fr eq ue nt flyer would prefer
di↵erent descri p ti on s of flights than a student who only fli es occasionally.
These various algorithms compute (possibly di↵erent) distin gu is hi n g descrip-
tions for target referents (more precisely: they select sets of properties that dis-
tinguish the target, but that still need to be expressed in words; see Section 2.6
below). Various strands of more recent work can be distinguish ed (surveyed in
Krahmer & van De emter, 2012). Some researchers have focussed on extending
the expressivity of the ‘classical’ algorithms, to include plurals (the two balls)
and relations (the ball in front of a cube) (e.g., Horacek, 1997; Stone, 2000;
Gardent, 2002; Kelleher & Kruij↵, 2006; Viethen & Dale , 2008, among many
others). Other work has cast the problem in probabilist i c terms; for example,
FitzGerald et al. (2013) frame reg as a problem of estimati ng a log-linear distr i-
bution over a space of logical forms representing expr es si on s for sets of objects.
Other work has concentrated on evaluating the performance of di↵erent reg
algorithms, by collecting controlled human references and comparing these with
the references predicted by various algorithms (e.g., Belz, 2008; Gatt & Belz,
2010; Jordan & Walker, 2005, again among many others). In a similar vein,
researchers have also started e x pl or in g the relevance of reg algorithms as psy-
cholinguisti c models of human language production (e.g., van Deemter et al.,
2012b).
Adi↵erent line of work has moved away from the separation between content
selection and form, performing these tasks jointly. For example, Engonopou-
los and Koller (2014) use a synchronous grammar that directly relates surface
strings to target referents, using a chart to compute the possible expressions
for a given target. This work bears some relationship to planning-based ap-
proaches we discuss in Section 3.2 below, which exploit grammatical formalisms
as planning operators (e.g. Ston e & Webber, 1998; Koller & Stone, 2007), solv-
ing realisation and content determination problems in tandem (including reg
as a special case).
Finally, in earlier work visual information was typically ‘simplified’ into a
table (as we did above), but there has been substantial progress on reg in more
complex scenarios. For example, the give challenge (Koller et al., 2010), pro-
vided impetus for the exploration of situated referen ce to objects in a virtual
environment (see also Stoia & Shockley, 2006; Garoufi & Koller, 2013). More
recent work has starte d exploring the interface between computer vision and
reg to produce descriptions of objects in complex, realistic visual scenes, in-
cluding photograp hs (e.g., Mitchell et al., 2013; Kazemzadeh et al., 2014; Mao
18