A Real-Time Eye Tracking Based Query Expansion
Approach via Latent Topic Modeling
Yongqiang Chen
1
, Peng Zhang
1
, Dawei Song
1,2
, Benyou Wang
1
,
1
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, China
2
Department of Computing and Communications, The Open University, United Kingdom
{cyq, pzhang, waby}@tju.edu.cn, dawei.song2010@gmail.com
ABSTRACT
Formulating and reformulating reliable textual queries have
been recognized as a challenging task in Information Re-
trieval (IR), even for experienced users. Most existing query
expansion methods, especially those based on implicit rele-
vance feedback, utilize the user’s historical interaction data,
such as clicks, scrolling and viewing time on documents, to
derive a refined query model. It is further expected that
the user’s search experience would be largely improved if we
could dig out user’s latent query intention, in real-time, by
capturing the user’s current interaction at the term level di-
rectly. In this paper, we propose a real-time eye tracking
based query expansion method, which is able to: (1) au-
tomatically capture the terms that the user is viewing by
utilizing eye tracking techniques; (2) derive the user’s laten-
t intent based on the eye tracking terms and by using the
Latent Dirichlet Allocation (LDA) approach. A systematic
user study has been carried out and the experimental results
demonstrate the effectiveness of our proposed methods.
Category and Subject Descriptors: H.3.3 [Information
Search and Retrieval]
Keywords: Eye Tracking, Query Expansion, Real Time,
Implicit Relevance Feedback, LDA
1. INTRODUCTION
Query expansion based on relevance feedback has long
been studied for its ability of finding out more relevant doc-
uments against ambiguous queries that users might type in.
Compared with explicit relevance feedback that often causes
the users extra cognitive overhead, implicit relevance feed-
back (IRF) has the advantage of obtaining the useful feed-
back information from the user interaction data to better
infer users’ search intention [6], yet without requiring the
users explicit relevance judgments.
Traditional implicit feedback based query expansion meth-
ods usually return static results according to the searchers’
historical log data, which cannot fully meet searchers’ dy-
namic information needs. Recently various real-time IRF
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
CIKM’15, October 19 - 23, 2015, Melbourne, VIC, Australia
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3794-6/15/10 ... $15.00.
DOI: http://dx.doi.org/10.1145/2806416.2806602.
approaches have been proposed [12] [7] [11], to better im-
prove search performance and users’ search experience. For
example, Singh et al [11] argued that a query expansion
framework should expand user’s search query dynamically
based on user’s implicit feedback provided at the time of
searching, in order to provide sufficient clues to reflect what
the user wants.
Eye Tracking has been used in IR for its ability to record
users’ eye movement data, which can reveal the users’ cogni-
tive process when going through the retrieved documents in
a natural way. Gwizdka et al. [5] have thoroughly examined
the relevance of a document and cognitive effort a user may
take using eye movement data. Ajanki et al. [1] designed an
eye tracking experiment, in which participants were asked
to search for relevant documents given a topic, and then the
gaze locations were used to find relevant terms to reformu-
late the queries. However, they had to avoid scrolling the
text due to the risk of missing some gaze location to word
mappings. Furthermore, this work was not concerned about
real-time query expansion.
Buscher et al. [3] used eye tracking to extract words based
on optical character recognition (OCR) technology, and as-
sign different words with different weights according to the
words being read or skipped. They have proved that apply-
ing eye tracking as a new data source is feasible for implicit
feedback. However, a limitation is that using OCR tech-
nique to extract words could not operate in real-time, i.e.,
the words can not be immediately grabbed when users are
reading them. Furthermore, the number of words extracted
by using eye trackers may be too small to fully express the
users’ search intention.
To tackle the problems described above, we propose a re-
al time eye tracking based query expansion model via latent
topic modeling. Different from Buscher et al. [3], we use a
screen word-capturing technique to capture words that the
searcher is reading in real-time, from which we could infer
what the searcher is currently interested in. In our approach,
we will have already refreshed the result list according to the
captured words when the searcher clicks on the refresh or the
next page button. The words captured by eye tracker are
used to expanding the original query. According to [2], doc-
uments can be considered as being generated by different
latent topics, each of which is a probability distribution of
words. Based on this assumption, we apply LDA to further
derive the searcher’s latent information needs, which natu-
rally relate to the words the user pays attention to. Our
experiments show that combining the words captured by