H-MRST：ELM驱动的概率数据范围查询新框架

103 浏览量更新于2024-08-27 收藏 911KB PDF 举报

"H-MRST：使用ELM支持不确定数据范围查询的新颖框架" 这篇研究论文探讨了在处理不确定数据范围查询时如何有效地利用 Extreme Learning Machine (ELM) 方法。ELM 是一种流行的机器学习算法，尤其在神经网络分类中表现出高效性能。在认知计算领域，数据分类是一个重要的应用，具有广泛的实际应用。论文关注的问题是，当面对概率数据的范围查询时，如何以轻量级结构存储每个不确定对象的特征，并利用这些结构进行剪枝或验证。传统的处理方法在处理无法剪枝或验证的对象时，需要进行昂贵的积分计算，这极大地增加了计算成本。此外，一些结构构建算法并不具备普适性。为了解决这些问题，论文提出了一个名为 PDR（概率度范围）查询的新概念，以此替代传统的范围查询方法。PDR 查询允许用户根据对象的概率度定义查询范围，从而提高了查询效率和准确性。论文中，作者们首先介绍了基于 ELM 的新框架——H-MRST（可能是Hierarchical-Mining for Range Search Trees的缩写）。这个框架旨在通过 ELM 的快速学习能力，减少对无法直接剪枝对象的复杂计算。H-MRST 结构设计了一种层次化的方法，能够在不同级别上对数据进行近似处理，以优化查询性能。在H-MRST中，ELM用于构建决策边界，以判断不确定对象是否在查询范围内。通过训练 ELM 模型，可以预测每个不确定对象落在查询范围内的概率，从而避免了对所有对象进行完整的积分计算。这种方法不仅减少了计算开销，而且能够适应不同类型和规模的数据集。此外，论文还可能涉及到了 H-MRST 构建和更新策略，以及性能评估。作者们可能对比了 H-MRST 与现有方法在处理不确定数据范围查询时的效率和准确性，证明了新框架的优势。这篇论文提出了一种新颖的框架H-MRST，它利用ELM算法处理概率数据的范围查询，解决了传统方法中的计算成本高和结构不通用的问题，提高了数据查询的效率。这一工作对于大数据分析、模糊查询和不确定性管理等领域具有重要的理论和实践意义。

Background

In this section, we ﬁrstly review existing algo rithms aimed

at solving prob-range query and classiﬁcation problem over

uncertain data in ‘‘Related Work’’ section and give a brief

introduction of ELM in ‘‘Extreme Learning Machine’’

section. At the end of this section, we formally deﬁne PDR

query over uncertain data. Table 1 summarizes the nota-

tions used in this paper.

Related Work

Recently, many indexes have been proposed for supporting

prob-range query over uncertain data. Among all these

indexes, R-MRST is the most representative one. In the

following, we ﬁrst explain the key idea of R-MRST.We

then explain the other relevant indexes.

The R-MRST

R-MRST was proposed by Zhu et al. [15], whose key idea

is to analyze the PDF of each uncertain object and use a

lightweight structure named MRST to store and index the

feature of PDF. To be more speciﬁc, given an uncertain

object ohr; PDFi, MRST uses a multi-resolution grid to

partition o.r, based on the following rationale: for each

subregion o

:r in o:r; o

:r is partitioned in a ﬁne-grained

manner, if o .PDF in o

:r changes dramatically; otherwise,

:r is partitioned with a coarse resolution. After parti-

tioning, for each subregion, part of its information is stored

in MRST, including app(o, i), lb(o, i) and ub( o , i). Based

on these information, the probability lower bound and

upper bound can be computed according to Eqs. 1 and 2,

respectively, as shown below:

S(q, i) here refers to the intersection region between o

and q.r; lb

app

ðq; iÞ and ub

app

ðq; iÞ denote the probability

lower bound and upper bound of o locating in o

:r \ q:r.

Then R-MRST uses Proper ty 1 to calculate the lower-

bound and upper-bound probability that o locates in q.r

app

ðq; iÞ¼lbðo; iÞSðq; iÞð1Þ

app

ðq; iÞ¼minðub ðo; iÞSðq; iÞ; appðo; iÞÞ ð2Þ

Property 1 Given an object o and a query q,ifq

overlaps

with (or contains) o’s subregions

i¼n

i¼1

; lb

app

ðq; o Þ¼

i¼n

i¼1

app

ðq; iÞ and ub

app

ðq; o Þ is

i¼n

i¼1

app

ðq; iÞ.

The Other Indexes for Supporting Prob-Range Query

Besides R-MRST, many other indexes also support prob-

range query over uncertain data. As a representative one,

U-tree proposed by Tao et al. [ 14] constructs a ﬁnite set of

probabilistically constrained region (PCR) for each

uncertain object. The pruni ng rules of U-tree are adaptive,

i.e., it depends on the topological relation between PCR

and query region. However, as has been discussed in [17],

the pruning ability of U-tree is weak. Zhang et al. proposed

UD-tree for indexing uncertain objects. Given an uncertain

object ohr; PDFi, their key idea is to partition o.r into a

group of subregions, pre-compute and store the appearance

probability of o in each subregion. According to [17], UD-

tree is superior compared with U-tree in terms of pruning

ability, but requires high space cost, leading to huge I/O

overhead. U-grid [18] proposed by Dmitri V. Kalashnikov

et al. is a two-layer index, the ﬁrst of which stores the

spatial information of uncertain data, and the second layer

stores the probability information. Similar to UD-tree,

U-grid also incurs high storage cost; in addition, it could

not provide a lower bound for prob-range query.

Uncertain Data Classiﬁcation

Classiﬁcation over uncertain data is another fundamental

problem in uncertain data management and has received

wide research attent ion. Cao et al. [1] proposed WEC-

ELM, a classiﬁer designed for classifying uncertain data

stream. A favorable featur e of WEC-ELM is its ﬂexibility;

to be more speciﬁc, WEC-ELM is self-adaptive and can

solve the problem of concept drift. Later in [19], Cao et al.

proposed two cla ssiﬁcation algorithms with improved

classiﬁcation accuracy, regardless of data distribution

(uniform or non-uniform). Apart from these, some works

expanded the scope of uncertain data classiﬁcation to

uncertain graph data and XML data. For instance, Han

et al. [4] extended gSpan algorithm and provided an

effective algorithm for classifying uncertain graph data. As

for uncertain XML data, Zhao et al. [20] proposed two

learning algorithms for uncertain XML data, as well as a

novel classiﬁer targeted to XML documents.

Table 1 The summary of notations

Notation Deﬁnition

o.PDF Probability density function of o

o.r Probability region of o

q.r The search region of the query

q:h The probability threshold of the query

The subregion i of o

app

ðq; iÞ The probability lower bound of o lying in o

:r [ q:r

app

ðq; iÞ The probability up-bound of o lying in o

:r [ q:r

S(q, i) The intersection area between o

:r and q.r

app(o, i) The probability of o lying in o

Cogn Comput

123

剩余12页未读，继续阅读

weixin_38626928

粉丝: 2
资源: 948

H-MRST：ELM驱动的概率数据范围查询新框架

S-MRST：索引不确定数据的新颖框架

matlab微分方程代码-mrst-pytorch:将MRST移植到PyTorch的概念证明

dual-porosity_mrst模拟_mrst油藏模拟_mrstimbibition_油藏_双重介质_

油藏数值模拟mrst

matlabmrst

在设计高性能相位锁定环(PLL)时，如何通过数学模型和精确计算来优化环路滤波器设计？请结合实际设计案例给出分析。

在verilog代码中怎么判断信号是时钟复位信号

多孔介质渗透率预测开源项目

MRST_Shale:页岩气模拟环境

为支持范围查询的不确定数据编制索引

最新资源