RUCIR在NTCIR-12 IMINE-2任务中的查询理解和垂直整合策略

研究论文

170 浏览量更新于2024-08-26 收藏 368KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"NTCIR-12 IMINE-2任务上的RUCIR团队参与了查询理解和垂直集成两个子任务，针对英文和中文主题进行了研究。在查询理解子任务中，他们结合了搜索引擎建议和维基百科提取的候选词，并通过聚类和排名对这些候选词的领域进行分类。在垂直集成子任务中，提出了一种通用方法，将传统多样性算法适应于处理预定义的子话题，利用分类的垂直领域进行多样化搜索。" 这篇研究论文详细介绍了中国人民大学信息学院在北京大数据管理和分析方法重点实验室的研究成果，他们在NTCIR-12 IMINE-2任务中的贡献。NTCIR（NII Testbed for Information Retrieval）是一个国际知名的评测论坛，旨在促进信息检索和自然语言处理领域的研究与开发。首先，在查询理解子任务中，RUCIR团队面对的问题是如何理解和解析用户的查询意图。他们采用了创新的方法，结合了搜索引擎的自动建议（通常基于用户历史搜索行为和流行度）以及维基百科的数据。通过这两种来源，他们能够获取到与用户查询相关的多种可能解释或扩展。然后，使用聚类算法对这些候选解释进行分组，以便发现潜在的主题或领域。接着，通过排序这些聚类，可以确定哪些领域与查询最相关，从而提供更精确的查询理解。其次，对于垂直集成子任务，传统的信息检索系统往往关注返回最相关的结果，而忽视了结果的多样性。RUCIR团队提出了一种新策略，旨在处理预定义的子话题，并利用已分类的垂直领域来增加搜索结果的多样性。这种方法对于处理复杂查询尤其有用，因为复杂的查询可能涉及到多个领域或主题。通过适应传统多样性算法，他们的方法能够在保持相关性的同时，确保搜索结果涵盖不同领域，为用户提供更全面的信息。引入这种垂直集成策略，不仅可以提高信息检索系统的用户体验，还能帮助解决现代信息过载问题。通过在查询理解和垂直集成上进行深入研究，RUCIR团队的工作为改进搜索引擎性能和理解用户查询意图提供了有价值的见解。此外，这篇论文还强调了在处理多语言任务时的挑战，即同时处理英文和中文主题。这表明，他们的方法具有跨语言的适用性，这对于全球化的信息检索环境具有重要意义。 RUCIR在NTCIR-12 IMINE-2任务上的工作展示了如何结合不同的数据源并应用先进的算法来优化信息检索的效率和准确性，尤其是在处理复杂查询和多领域信息需求时。这一研究不仅对学术界有深远影响，也为实际的搜索引擎开发提供了实用的解决方案。

资源详情

资源推荐

RUCIR at NTCIR-12 IMINE-2 Task

Ming Yue

, Zhicheng Dou

, Sha Hu

2∗

, Jinxiu Li

, Xiaojie Wang

, and Ji-Rong Wen

Beijing Key Laboratory of Big Data Management and Analysis Methods, China

School of Information, Renmin University of China

{yomin,dou,wangxiaojie}@ruc.edu.cn,

{sallyshahu,jinxiu2216,jirong.wen}@gmail.com

ABSTRACT

In this paper, we present our participation in the Query Un-

derstanding subtask and the Vertical Incorporating subtask

of the NTCIR-12 IMine-2 task, for both English and Chi-

nese topics. In the Query Understanding subtask, we com-

bine the extracted candidates from search engine suggestion-

s and Wikipeida, and classify their verticals after clustering

and ranking them. In the Vertical Incorporating subtask, we

provide a general method for adapting traditional diversity

algorithms to deal with predeﬁned subtopics with classiﬁed

verticals in diversiﬁcation.

Team Name

RUC IR

Subtasks

Query Understanding (Chinese, English)

Vertical Incorporating (Chinese, English)

1. INTRODUCTION

In modern information systems, users type in some key-

words and search engines return matched results. However,

with an ambiguous or broad query, a retrieval system or

search engine may misunderstand users’ interests, by sim-

ply comparing the query with the corpus and returning a

ranked result list. The goal of NTCIR-12 IMine-2 Task is

to ﬁnd potential intents for a query and classify each intent

into one of six verticals. These verticals help us detect d-

iﬀerent user interests more precisely. The classiﬁed intents

with their verticals can also be used to improve document

ranking. The IMine-2 task consists of two subtasks: Query

Understanding and Vertical Incorporating.

In the Query Understanding subtask, the system is re-

quired not only to return a ranked list of subtopic candidates

for a given query, but also to identify the relevant vertical

∗

Corresponding author

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

intent for each subtopic. A subtopic of a given query spe-

cializes or disambiguates the original query. These subtopics

with their verticals present what information the users are

interested in.

We ﬁrst extract candidates from disambiguation pages in

Wikipedia [3, 4]. We do not do any other operations on

oﬃcial query suggestions because they are already good re-

sults. Due to the fact that candidates are usually short and

do not have enough information, we further retrieve top 300

results from the search engine and group them into clusters

to ﬁnd important candidates by using two diﬀerent cluster-

ing algorithms. After that, we rank them by their relevance

and diversity. Finally we make a classiﬁcation to get each

subtopic’s vertical.

In the Vertical Incorporating subtask, our goal is to diver-

sify search results in the top ranks, just like the Document

Ranking subtask in IMine-1

. The unique part of VI task

is that it classiﬁes subtopics into verticals to solve diver-

siﬁcation problem. The algorithms have to consider addi-

tional virtual documents involved by the verticals from the

subtopics of a query.

We provide a general method to adapt traditional diversi-

ﬁcation algorithms to deal with the VI subtask. The main-

ly diﬀerence from traditional models is that we (1)consider

verticals and virtual documents in diversity, and (2)under-

stand subtopics by ﬁne-grained information. We have tried

this method on several state-of-the-art models, and report

the results of PM2[6] as the basic method in this subtask.

2. QUERY UNDERSTANDING

We divide this subtask into two smaller tasks. One is

subtopic mining, similar to IMINE, the former NTCIR sub-

task. The other one is a classiﬁcation task, which can be

treated as a classic machine learning problem. In NTCIR-

12, we use query suggestions and knowledge bases to mine

subtopics and classify the vertical intent of each subtopic.

2.1 Methodology

Step 1. Extracting Subtopic Candidates From Various

Resources. Query suggestions from search engines are one

of the oﬃcial data sets. Besides this, we also use the knowl-

edge base of Wikipedia. In Wikipedia, a disambiguation

page describes diﬀerent aspects for a speciﬁc term. We check

each query in the task. If a query has a disambiguation page,

the terms on the page would be considered as candidates.

http://www.thuir.org/IMine/

Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, June 7-10, 2016 Tokyo Japan

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38707826

粉丝: 5
资源: 907

RUCIR在NTCIR-12 IMINE-2任务中的查询理解和垂直整合策略

NTCIR15-QA-Lab-PoliInfo-2-Dataset

CNU System in NTCIR-11 IMine Task - Global Semantic Expansion for Hierarchical Query Intent Identification

各类别脑部肿瘤的ct数据集

JSP学生学籍管理系统设计与实现(源代码+论文+开题报告+外文翻译+答辩PPT).zip

省市区数据，完成三级联动，选择地区

机械原理课程设计网球自动捡球机.doc

2024秋招华为笔试题大全-仅供参考具体需要根据实际修改

借助于Ascend310 AI处理器完成深度学习算法部署任务，

基于深度学习的物体识别与抓取方法，六自由度机械臂，python编写程序.zip

电子设计论文文档资料液面检测器电子设计论文文档资料液面检测器

基于深度学习的多特征电力负荷预测.zip

CenterNet 部署版本，便于移植不同平台（onnx、tensorRT、rknn、Horizon）

c#代码介绍23种设计模式-03工厂模式(附代码)

基于深度学习对法国租界地黑白照片上色模型.zip

集团公司战略管理制度.doc

JSP学生网上选课系统设计(源代码+论文+答辩PPT).zip

智能翻译官cpc-bd07-20752777288491826.exe

C++课程设计之变量和类型.pdf

【4层】办公楼全套设计（2400平左右，含计算书，施工组织设计，横道图，平面布置图，建筑图，+结构图）.zip

机械设计课程设计一级减速器.doc

最新资源