CNU System in NTCIR-11 IMine Task
Global Semantic Expansion for Hierarchical Query Intent Identification
Wei Song, Wenbin Xu, Lizheng Liu, Hanshi Wang
College of Information Engineering
Capital Nor mal University, China
{wsong, wbxu, lzliu, hswang}@cnu.edu.cn
ABSTRACT
Understanding user intent is important for interactive and
personalized information retrieval. For ambiguous queries,
user intent space actually forms a hierarchical top down ar-
chitecture: from senses to subtopics, rather than a flat struc-
ture. This paper presents the CNU system in NTCIR-11
IMine task. Our method constructs the hierarchical struc-
ture by exploiting global semantic representation and expan-
sion. The highlights include: 1) We use word semantic vec-
tors and propose a query dependent semantic composition
for representing query aspect phrases. Our target is to allevi-
ate the term-mismatch and data sparseness problems, which
shallow lexical matching and co-occurrence based local se-
mantics are ineffective to overcome. 2) We expand query
subtopics by introducing new words according to global se-
mantic relatedness and cluster these words for query sense
induction. The evaluation results on and post NTCIR-11
show that our method could mine query subtopics and senses
effectively.
Team Name
CNU
Subtask
Subtopic Mining
Keywords
Query Intent, Query Sense, Word Embeddings, Semantic
Expansion, Query Subtopic
1. INTRODUCTION
Inferring query intent is one of the most important tasks
for information retrieval. Web queries are often ambigu-
ous and multi-faceted. This brings in great challenges for
providing most relevant information to users. If the sys-
tem could mine all potential query intents and construct a
proper structure to organize them, it could provide more so-
phisticated services to deal with such difficult queries. For
example, we could diversify the search results [9] or provide
query summarizations [11] according to multiple query as-
pects. We could also represent potential query intents to
users in an interactive way, and make decision with the help
from user feedbacks.
In recent years, query intent mining has gained much at-
tention. Much work has been done using information from
different resources such as query logs or search results [1, 2,
3, 8]. However, most systems provide a flat list of phrases
to represent query subtopics. Actually, query intent are hi-
erarchical rather than flat. This is obvious for ambiguous
queries. The top layer covers different senses of an ambigu-
ous query, which usually refer to different meanings in re-
ality. The lower layer covers various facets related to the
meaning. For example, “apple” is an ambiguous query. It
has several meanings referring to different objects in real-
ity. Subtopics reveal different aspects of a specific object. A
flat list structure is not enough to represent the information
need space for a query. For example, “apple diet” and “ap-
ple notebook sale” are intents for different meanings but are
mixed together. While “apple price” could be a subtopic for
multiple meanings but it is unable to distinct in a flat list.
In addition, in most architectures of existing systems, clus-
tering text fragments is an important component [1, 2, 12].
Due to the shortness of text fragments, either extracted
from query logs or do cuments, data sparseness and term-
mismatch problem result in great challenges for clustering.
It is necessary to exploit new semantic representations to
overcome these problems.
This paper introduces our system in NTCIR-11 IMine task
which induces senses and subtopics of an ambiguous query
automatically. Specially, we make use of global semantic
vector representations and expand query subtopics to bring
in global semantic information. The global information is
useful for bridging local subtopics which are difficult to be
judged as similar using local semantic representations. Our
main contributions and preliminary findings include:
• We apply word semantic vectors and prop ose a sematic
composition method to represent query aspect phrases.
The experimental results show that such representa-
tions are more effective than traditional semantic rep-
resentations for clustering query reformulations to in-
fer query subtopics.
• We induce query senses by clustering expanded words
which are related to query subtopics in terms of global
semantics. The experiments prove that global seman-
tic expansion is effective for query sense induction.
2. NTCIR-11 IMINE TASK
NTCIR-11 organizes the IMine task which is short for
search Intent Mining [6]. IMine task consists of two sub-
tasks: Subtopic Mining and Document Ranking. We at-
tend the subtopic mining subtask on Chinese queries. The
subtopic mining subtask this year is different from before.
It requires participants to submit a two-level hierarchy of
Proceedings of the 11th NTCIR Conference, December 9-12, 2014, Tokyo, Japan
69