结构化关键字查询：XML搜索结果的语义评价方法

82 浏览量更新于2024-07-14 收藏 1.05MB PDF 举报

"具有结构的关键字查询：针对XML搜索结果的语义评分" 本文是一篇研究论文，探讨了在XML（可扩展标记语言）搜索中如何利用关键字查询并结合结构信息来实现更精确的语义评分。XML作为一种重要的数据存储和交换格式，常用于大量结构化数据的管理。在XML搜索中，仅仅依赖关键词匹配往往无法充分利用XML的结构特性，导致搜索结果可能与用户的实际需求不完全匹配。论文作者Xiping Liu、Changxuan Wan和Dexi Liu提出了一种新的方法，即“结构化的关键字查询”，该方法旨在通过考虑XML文档的层次结构和元素之间的关系，对搜索结果进行语义上的评分。语义评分可以帮助用户更准确地定位到所需信息，提高搜索效率和满意度。在传统的XML搜索中，通常采用基于关键词的查询方法，这种方法仅考虑关键词出现的频率和位置，而忽略了上下文和结构信息。论文中，作者可能讨论了如何将这些结构信息纳入评分模型，例如，通过分析元素间的嵌套关系、属性值以及路径信息，以理解查询关键词在文档结构中的意义。此外，论文可能会探讨如何利用语义相似度计算，如基于本体或词汇表的相似度评估，来增强查询与XML节点之间的关联性。这可能涉及到使用词汇资源，如WordNet，或者利用本体中的概念和关系来计算关键词与XML元素的语义距离。为了验证所提方法的有效性，论文很可能包含了实验部分，对比了结构化关键字查询与其他传统查询方法在不同场景下的性能，如查询准确率、召回率和F1分数等指标。此外，实验结果可能还展示了在特定领域数据集上，如生物医学、图书元数据等领域，结构化查询如何改善用户体验。这篇论文提出了一个创新的解决方案，以解决XML搜索中的关键问题，即如何更好地利用结构信息提升搜索结果的相关性和质量。这对于XML数据管理和信息检索领域具有重要的理论和实践价值，有助于推动XML搜索引擎的发展，提高信息获取的效率和准确性。

Keyword query with structure: towards semantic scoring of XML

search results

Xiping Liu

•

Changxuan Wan

•

Dexi Liu

 Springer Science+Business Media New York 2015

Abstract Keyword search is an effective paradigm for

information discovery and has been introduced recently to

query XML documents. Scoring of XML search results is

an important issue in XML keyword search. Traditional

‘‘bag-of-words’’ model cannot differentiate the roles of

keywords as well as the relationship between keywords,

thus is not proper for XML keyword queries. In this paper,

we present a new scoring method based on a novel query

model, called keyword query with structure (QWS), which

is specially designed for XML keyword query. The method

is based on a totally new view taken by the QWS model on

a keyword query that, a keyword query is a composition of

several query units, each representing a query condition.

We believe that this method captures the semantic rele-

vance of the search results. The paper ﬁrst introduces an

algorithm reformulating a keyword query to a QWS. Then,

a scoring method is presented which measures the rele-

vance of search results according to how many and how

well the query conditions are matched. The scoring method

is also extended to clusters of search results. Experimental

results verify the effectiveness of our methods.

Keywords XML keyword search  Keyword query with

structure  Query unit  Cluster

1 Introduction

Keyword search is an effective paradigm for information

discovery that has been extensively studied for ﬂat docu-

ments (text, HTML, etc.). As XML has been accepted as a

standard for document mark-up and exchange, it is natural

to extend keyword search techniques to support XML data

[1, 2].

Scoring is at the core of keyword search. Scoring methods

have been extensively studied in traditional information

retrieval (IR) ﬁeld, and a number of scoring functions have

been proposed [3]. Several scoring methods have also been

proposed concerning XML keyword search [1, 2]. Existing

XML scoring methods are based on the traditional ‘‘bag-of-

words’’ model. In this model, a text (such as a sentence or a

document) is represented as the bag (multiset) of its words,

disregarding grammar and even word order but keeping

multiplicity. Though simple enough, the model is not well-

suited for XML keyword search.

Consider a query Q

: ‘‘journal database article transac-

tion’’. The query intends to search for articles about

‘‘transaction’’ in a journal named ‘‘database’’. Given an

XML document, an ideal result of the query is a subtree

rooted at an element labelled ‘‘article’’ containing ‘‘trans-

action’’ in its text content, which is nested in an element

labelled ‘‘journal’’ with ‘‘database’’ in its content. Obvi-

ously, it is not proper to view the query as a bag of words.

First, the keywords in the query are different in their roles.

The keywords ‘‘article’’ and ‘‘journal’’ should be treated as

tags of elements, while ‘‘database’’ and ‘‘transaction’’ are

keywords appearing in text contents. Second, the rela-

tionships between keywords in the query are different. The

keyword ‘‘transaction’’ is more closely related to ‘‘article’’

than to ‘‘database’’, and ‘‘database’’ has closer relationship

with ‘‘journal’’ than with ‘‘article’’.

& Xiping Liu

lewislxp@gmail.com

School of Information Technology, Jiangxi University of

Finance and Economics, Nanchang 330013,

People’s Republic of China

123

Inf Technol Manag

DOI 10.1007/s10799-015-0247-z

Author's personal copy

剩余14页未读，继续阅读

weixin_38698018

粉丝: 6
资源: 902

结构化关键字查询：XML搜索结果的语义评价方法

基于关键字密度的XML关键字检索.pdf

酒店管理系统文档

使用SHELL脚本读取XML文件，交互式输出关键字后，输出关键字所在xml文件中的行数和关键字

#!/bin/bash read -p "请输入xml文件路径：" xml_file # 读取关键字 read -p "请输入关键字：" keyword # 搜索关键字，并输出所在行数和关键字 grep -in "$keyword" "$xml_file" | awk -F ':' '{print "行数：" $1 ", 关键字：" $2}'，不输入文件名，默认路径下

tinyxml2::XMLDocument::Parse怎么使用

tinyxml2::XMLDocument::SaveFile()怎么使用

xml确定是utf-8编码的文件，用pugi::xml_document doc;pugi::xml_parse_result result = doc.load_file("example.xml", pugi::parse_default, pugi::encoding_utf8);解析，中文乱码

boost::xml 修改某项

pugi::xml_parse_result类型只有真和假两种吗

tinyxml2::XMLDocument::Parse 函数怎么使用

最新资源