Background
In this section, we firstly review existing algo rithms aimed
at solving prob-range query and classification problem over
uncertain data in ‘‘Related Work’’ section and give a brief
introduction of ELM in ‘‘Extreme Learning Machine’’
section. At the end of this section, we formally define PDR
query over uncertain data. Table 1 summarizes the nota-
tions used in this paper.
Related Work
Recently, many indexes have been proposed for supporting
prob-range query over uncertain data. Among all these
indexes, R-MRST is the most representative one. In the
following, we first explain the key idea of R-MRST.We
then explain the other relevant indexes.
The R-MRST
R-MRST was proposed by Zhu et al. [15], whose key idea
is to analyze the PDF of each uncertain object and use a
lightweight structure named MRST to store and index the
feature of PDF. To be more specific, given an uncertain
object ohr; PDFi, MRST uses a multi-resolution grid to
partition o.r, based on the following rationale: for each
subregion o
i
:r in o:r; o
i
:r is partitioned in a fine-grained
manner, if o .PDF in o
i
:r changes dramatically; otherwise,
o
i
:r is partitioned with a coarse resolution. After parti-
tioning, for each subregion, part of its information is stored
in MRST, including app(o, i), lb(o, i) and ub( o , i). Based
on these information, the probability lower bound and
upper bound can be computed according to Eqs. 1 and 2,
respectively, as shown below:
S(q, i) here refers to the intersection region between o
i
:r
and q.r; lb
app
ðq; iÞ and ub
app
ðq; iÞ denote the probability
lower bound and upper bound of o locating in o
i
:r \ q:r.
Then R-MRST uses Proper ty 1 to calculate the lower-
bound and upper-bound probability that o locates in q.r
lb
app
ðq; iÞ¼lbðo; iÞSðq; iÞð1Þ
ub
app
ðq; iÞ¼minðub ðo; iÞSðq; iÞ; appðo; iÞÞ ð2Þ
Property 1 Given an object o and a query q,ifq
r
overlaps
with (or contains) o’s subregions
S
i¼n
1
i¼1
o
i
; lb
app
ðq; o Þ¼
P
i¼n
1
i¼1
lb
app
ðq; iÞ and ub
app
ðq; o Þ is
P
i¼n
1
i¼1
ub
app
ðq; iÞ.
The Other Indexes for Supporting Prob-Range Query
Besides R-MRST, many other indexes also support prob-
range query over uncertain data. As a representative one,
U-tree proposed by Tao et al. [ 14] constructs a finite set of
probabilistically constrained region (PCR) for each
uncertain object. The pruni ng rules of U-tree are adaptive,
i.e., it depends on the topological relation between PCR
and query region. However, as has been discussed in [17],
the pruning ability of U-tree is weak. Zhang et al. proposed
UD-tree for indexing uncertain objects. Given an uncertain
object ohr; PDFi, their key idea is to partition o.r into a
group of subregions, pre-compute and store the appearance
probability of o in each subregion. According to [17], UD-
tree is superior compared with U-tree in terms of pruning
ability, but requires high space cost, leading to huge I/O
overhead. U-grid [18] proposed by Dmitri V. Kalashnikov
et al. is a two-layer index, the first of which stores the
spatial information of uncertain data, and the second layer
stores the probability information. Similar to UD-tree,
U-grid also incurs high storage cost; in addition, it could
not provide a lower bound for prob-range query.
Uncertain Data Classification
Classification over uncertain data is another fundamental
problem in uncertain data management and has received
wide research attent ion. Cao et al. [1] proposed WEC-
ELM, a classifier designed for classifying uncertain data
stream. A favorable featur e of WEC-ELM is its flexibility;
to be more specific, WEC-ELM is self-adaptive and can
solve the problem of concept drift. Later in [19], Cao et al.
proposed two cla ssification algorithms with improved
classification accuracy, regardless of data distribution
(uniform or non-uniform). Apart from these, some works
expanded the scope of uncertain data classification to
uncertain graph data and XML data. For instance, Han
et al. [4] extended gSpan algorithm and provided an
effective algorithm for classifying uncertain graph data. As
for uncertain XML data, Zhao et al. [20] proposed two
learning algorithms for uncertain XML data, as well as a
novel classifier targeted to XML documents.
Table 1 The summary of notations
Notation Definition
o.PDF Probability density function of o
o.r Probability region of o
q.r The search region of the query
q:h The probability threshold of the query
o
i
The subregion i of o
lb
app
ðq; iÞ The probability lower bound of o lying in o
i
:r [ q:r
ub
app
ðq; iÞ The probability up-bound of o lying in o
i
:r [ q:r
S(q, i) The intersection area between o
i
:r and q.r
app(o, i) The probability of o lying in o
i
:r
Cogn Comput
123