利用隐马尔可夫模型解析XML关键词查询

22 浏览量更新于2024-08-26 收藏 827KB PDF 举报

"这篇研究论文探讨了如何使用隐马尔可夫模型(HMM)来解释XML关键字查询。在XML数据库的关键词搜索中，由于XML文档的结构特性与传统平坦文档不同，因此需要特殊的方法来处理。传统的词袋模型无法考虑关键词的角色和它们之间的关系，不适合用于XML关键词搜索。论文提出了一个新的模型——半结构化关键词查询(SSQ)模型，该模型将关键词查询视为由多个代表查询条件的单元组成。通过两步方法，论文首先引入基于HMM的概率方法来计算查询关键词与数据库术语的最佳映射，然后利用这些映射来解析查询并进行有效的搜索。" 在这篇研究中，作者Xiping Liu、Changxuan Wan和Dexi Liu深入研究了XML文档的关键词查询问题。他们指出，XML文档的结构化特性使得传统的关键词搜索方法效率低下，因为这些方法未能充分理解关键词之间的关系以及它们在文档结构中的位置。为了解决这个问题，他们提出了半结构化关键词查询模型(SSQ)。 SSQ模型的核心思想是将一个关键词查询分解为多个查询单元，每个单元都对应一个特定的查询条件。这种分解方式允许模型更细致地理解查询意图，同时考虑XML文档的结构信息。为了实现这个模型，研究者采用了一个基于隐马尔可夫模型的概率方法。隐马尔可夫模型(HMM)是一种统计建模技术，常用于处理序列数据，如自然语言处理中的词性标注和语音识别。在XML关键词查询中，HMM被用来计算查询关键词与XML文档中的元素或属性的最佳匹配，这涉及到对查询关键词的顺序和相关性的概率建模。具体来说，第一步是建立HMM来表示关键词和数据库术语之间的关系，其中隐藏状态代表数据库中的元素或属性，观察状态代表查询关键词。通过前向-后向算法或者维特比算法，可以找出最有可能的隐藏状态序列，即最佳的关键词到数据库术语的映射。第二步，使用这个映射来解析和执行查询。通过理解关键词在XML结构中的位置和上下文，可以更精确地定位和提取相关信息，从而提高查询的准确性和效率。这篇研究论文通过引入HMM到XML关键词查询中，提供了一种新的、更为有效的处理方法，这有助于提升XML数据库的检索性能，并为XML文档的复杂查询需求提供了理论支持。这一工作对于XML数据管理和信息检索领域具有重要的理论和实践意义。

Xiping Liu i dr. Prikaz pretrage XML ključne riječi primjenom skrivenog Markovljevog modela

Tehnički vjesnik 23, 6(2016), 1649-1658 1649

ISSN 1330-3651 (Print), ISSN 1848-6339 (Online)

DOI: 10.17559/TV-20150314113111

INTERPRETING XML KEYWORD QUERY USING HIDDEN MARKOV MODEL

Xiping Liu, Changxuan Wan, Dexi Liu

Original scientific paper

Keyword search on XML database has attracted a lot of research interests. As XML documents are very different from flat documents, effective search of

XML documents needs special considerations. Traditional bag-of-words model does not take the roles of keywords and the relationship between keywords

into consideration, and thus is not suited for XML keyword search. In this paper, we present a novel model, called semi-structured keyword query (SSQ),

which understands a keyword query in a different way: a keyword query is composed of several query units, where each unit represents query condition.

To interpret a keyword query under this model, we take two steps. First, we propose a probabilistic approach based on a Hidden Markov Model for

computing the best mapping of the query keywords into the database terms, i.e., elements, attributes and values. Second, we generate SSQs based on the

mapping. Experimental results verify the effectiveness of our methods.

Keywords: hidden Markov model (HMM); semi-structured keyword query (SSQ); XML keyword query

Prikaz pretrage XML ključne riječi primjenom skrivenog Markovljevog modela

Izvorni znanstveni članak

Pretraživanje ključne riječi na XML bazi podataka privuklo je prilično zanimanja. Kako se XML dokumenti vrlo razlikuju od plošnih (flat) dokumenata,

učinkovita pretraga XML dokumenata zahtijeva posebno razmatranje. Tradicionalni model vreće riječi (bag-of-words) ne uzima u obzir uloge ključnih

riječi i odnos između ključnih riječi pa prema tome nije pogodan za XML pretragu ključne riječi. U ovom radu predstavljamo novi model, nazvan polu-

strukturno pretraživanje ključne riječi (SSQ), koji podrazumijeva pretraživanje ključne riječi na različit način; to se pretraživanje sastoji od nekoliko

cjelina pretrage i svaka cjelina predstavlja stanje pretrage (query condition). Za interpretaciju pretrage po tom modelu, potrebna su dva koraka. Prvo,

predlažemo probabilistički pristup zasnovan na skrivenom Markovljevom modelu za izračunavanje najboljeg uklapanja traženih ključnih riječi u termine

baze podataka, tj. elemenata, atributa i vrijednosti. Drugo, generiramo konstrukcije ključnih riječi (SSQs) na osnovu uklapanja. Eksperimentalni rezultati

potvrđuju učinkovitost naših metoda.

Ključne riječi: polu-strukturno pretraživanje ključne riječi; skriveni Markovljev model (HMM); XML pretraživanje ključne riječi

1 Introduction

Keyword search, due to its simplicity and friendness,

has been widely used and extended to search a variety of

sources of information, such as relational database and

XML documents [1, 2]. An XML document is composed

of nested elements. The nested structure of XML

documents poses great challenges to keyword search

techniques, as the users are able to search XML

documents through structure and text contents.

The unique characteristics of XML documents calls

for a fresh look at and deep understanding of the keyword

query. Existing XML keyword search methods are based

on the "bag-of-words" model. In this model, a text unit

(such as a paragraph or a document) is taken as the bag

(multiset) of words, which means that the grammar and

order of words are not taken into consideration. However,

this model is too simple for XML keyword search.

Consider a query Q

: "journal info system article

expert".The query intention is to search for articles about

"expert" in a journal named "info system". In an XML

database, the answer may be an element labelled "article"

nested in an element labelled "journal", where the

"article" element contains "expert" in its text content, and

the "journal" element has "info system" in its content.

Obviously, it is not natural to view the query as a bag of

words. First, the keywords in the query have different

roles. The keywords "article" and "journal" are labels of

elements, while "expert" and "info system" are just

keywords in texts. Second, there exist different

relationships between keywords. The keyword "expert" is

more closely related to "article" than to "info" and

"system", and "info system" has closer relationship with

"journal" than with "article".

From this example we can see that the traditional

bag-of-words model is not proper for XML keyword

search, because it does not provide information about the

structure of the query hidden in the XML keyword query.

In this paper, we present a new model, called semi-

structured keyword query (SSQ), to model a keyword

query against an XML document. An SSQ is different

from a keyword query in that it has structural information,

and it is less strict compared with a structured query. The

SSQ model is special in that it makes explicit the structure

of the query. However, it is not straightforward to

transform a keyword query into a query in SSQ form. In

this work, we propose two steps to make the

transformation. In the first step, we map the words in the

keyword query into database terms, where each term is

either from the schema vocabulary or from the texts of the

database. As each word can be mapped to many terms, we

develop a Hidden Markov Model-based probabilistic

approach for interpreting the query keywords in terms of

database terms. In the second step, we design an

algorithm which takes a sequence of database terms as

input, and outputs a set of SSQs. Once the SSQs are

generated, it is possible to improve XML search results

based on the SSQs, but that is beyond the scope of this

paper.

To summarize, the following contributions are made in

this paper:

1) We propose a novel way to analyse and interpret an

XML keyword query. The approach makes explicit the

structural information hidden in the keyword query, and

transforms the query into a semi-structured keyword query

(SSQ). The SSQ helps to get the semantics and intention of

a keyword query.

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38614484

粉丝: 0
资源: 874

利用隐马尔可夫模型解析XML关键词查询

隐马尔可夫模型源码与实验报告解析

深入理解隐马尔可夫模型学习资料

Python实现隐马尔可夫模型代码解析

【统计学在进化生物学中的应用】：使用BEAST应对统计挑战，让你的研究更上一层楼

理解马尔可夫模型与隐马尔可夫模型

毕设和企业适用springboot企业数据管理平台类及跨境电商管理平台源码+论文+视频.zip

基于net的超市管理系统源代码（完整前后端+sqlserver+说明文档+LW）.zip

LABVIEW程序实例-公式节点.zip

毕设和企业适用springboot社交应用平台类及用户数据分析平台源码+论文+视频.zip

大米商城开源版damishop(适合外贸)

最新资源