没有合适的资源?快使用搜索试试~ 我知道了~
首页在线概率序列的窗口子序列匹配
在线概率序列的窗口子序列匹配
需积分: 9 0 下载量 45 浏览量
更新于2024-09-09
收藏 802KB PDF 举报
"这篇研究论文探讨了在线窗口子序列匹配在概率序列中的应用,主要关注数据挖掘、数据库和相关领域。传统的子序列匹配通常处理确定性的字符串,但实际应用中,如数据流监控、复杂事件处理和时间序列数据分析等场景,字符串往往具有噪声和不确定性,因此引入了概率序列的概念。论文在在线环境下对这个问题进行了研究,强调效率的重要性。 首先,作者定义了查询语义,并提出了一种精确算法来解决在线窗口子序列匹配问题。这种精确算法旨在处理这些带有噪声和不确定性的概率序列。然后,他们进一步提出了一个随机近似算法,该算法在保持较高准确性的前提下,运行速度更快。这个随机算法是经过理论证明其准确性的。 为了进一步提高效率,论文还设计了一个过滤算法,通过一种适应序列流内容的优化技术来减少计算量。此外,论文还特别关注了包含否定元素的模式,提出了相应的匹配算法,这对于排除特定模式或异常值的情况非常有用。 为了验证所提出的算法的有效性,作者进行了系统性的实证研究,使用了三个真实数据集和一些合成数据集进行测试。这些实验结果展示了算法在不同场景下的性能,为概率序列的在线窗口子序列匹配提供了有力的工具和理论支持。" 这篇研究工作对数据挖掘和数据库领域的实践者具有重要价值,它不仅扩展了传统的子序列匹配理论,还提供了处理现实世界中复杂、有噪声数据的有效方法。通过精确和近似算法的结合,以及适应性的过滤策略,研究人员和工程师可以更有效地分析和处理概率数据流,从而在诸如医疗信号分析(如图1中展示的心电图信号及其R-R间期)、金融交易监控、环境监测等多种场景中找到潜在的模式和异常。
资源详情
资源推荐
Note that the substring matching problem [1] is a special case of
subsequence matching where
=
+1, for 1≤≤−1.
There is a well-known algorithm to solve DWSM with time com-
plexity O(l) per data item in the sequence, and space complexity
O(l) overall. The algorithm is as follows [1]:
for each ∈[1…] do
[
]
←0 end for
for each ≥1 in increasing order do
if
[
]
=[1] then
[
1
]
← end if
for each ∈[2…] do
if
[
]
=[] then
[
]
←[−1] end if
end for
if −
[
]
< then report end if
end for
In this algorithm, s[j] is the starting position in the sequence X of
t
he most recently seen minimal substring containing p[1]…p[j].
Thus, in the second to last line, if the window size −
[
]
<,
we report the index i as a match (i.e., it is the end of a match win-
dow). Our exact algorithm (Section 3) substantially modifies and
extends this algorithm for probabilistic sequences. Furthermore,
we devise a number of novel techniques – a provably accurate
randomized approximation algorithm, an adaptive filtering optimi-
zation, and algorithms that handle negations – to meet the real-
time continuous monitoring needs.
Definition 2 (DWSM with Negation). We extend the alphabet of
a pattern to be
=⋃
, where ∈⟺ ∈
. Here, is called
the negation of c. We denote the subsequence of p that consists of
all characters in (called positive characters) as p
+
, and the sub-
sequence that consists of all characters in
(called negative char-
acters) as p
−
. Moreover, let l
+
= |p
+
| and l
−
= |p
−
| (hence, l
+
+ l
−
= l). The DWSM with negation is the DWSM problem of pattern
p
+
over string X with the following constraints:
(1) For a negative character (i.e., negation of ) in p that has
positive characters c
1
to the left and c
2
to the right, but no
other positive characters between c
1
and c
2
in p, there must
not be an occurrence of c between the matching position of
c
1
and that of c
2
in X. We say that is a surrounded nega-
tion.
(2) For a negative character that does not have a positive
character to the left in , it is required that the last charac-
ter of must be positive (i.e., []∈). Let []’s match in
X be [
], and let the first positive character in be
.
Then the matching dictates that between index
−+1
and the matching position of
in X, there is no occurrence
of . We say that is a front negation.
(3) Similarly, for a negative character that does not have a
positive character to the right in , it is required that the
first character of must be positive (i.e., [1]∈). Let
[1]’s match in X be [
], and let the last positive charac-
ter in be
. Then the matching dictates that between the
matching position of
and index
+−1 in X, there is
no occurrence of . We say that is a rear negation.
Example 1. Consider the pattern = 303
03
in Section 1 that
detects a too-long R-R interval. Let =360. Then
=300,
=3,
=3
3
, and
=2. The first 3
is a surrounded negation
requiring no 3’s between two matching 0’s around it. The second
3
is a rear negation, requiring that, between the matching position
of the previous positive character in (i.e., 0) and the end of the
window (i.e., distance = 360 from the matching position of the
first 3), there is no occurrence of 3’s. Therefore, the pattern de-
tects no two R peaks within a large window of size 360, which
implies that the R-R interval is too long.
We are now ready to define the subsequence matching problem
over a probabilistic sequence, which is the focus of this work.
Definition 3 (PWSM). Let each [] in string be an independ-
ent random variable that has a probability mass function (PMF)
over , for ≥1. The probabilistic windowed subsequence match-
ing (PWSM) problem (with or without negation) is the probabilis-
tic version of the corresponding DWSM (Definition 1 or 2) with an
extra parameter , called the probability threshold. An index posi-
tion is reported if and only if the probability that it is reported
over all (deterministic) possible worlds based on DWSM is at least
.
Our focus in this work is efficient continuous online monitoring
PWSM queries that detect subsequence patterns. Thus, high
throughput and low memory consumption are of paramount signif-
icance.
3. AN EXACT ALGORITHM
We start with the PWSM problem without negations; negations
will be discussed in Section 6. For an index in the sequence,
define a random variable [] to be the size of the minimal win-
dow to the left of that contains as a subsequence. In other
words, the window from index −
[
]
+1 to index should
contain as a subsequence, and is minimal. Then a straightfor-
ward approach for PWSM is to consider all possible worlds of a
sequence up to index and calculate Pr([] ≤ ) by summing
the probabilities of all possible worlds in which []≤. The
index is reported if Pr([] ≤ ), for each . However, this is
clearly infeasible due to the exponential number of possible
worlds.
It turns out that we can have a much more efficient algorithm that
takes time (∙) per data item and space (∙) overall. To
make the presentation of the algorithms more succinct, we start
with some notations.
3.1 Notations
We first define a data type called truncated window size distribu-
tion, denoted as . A type object d essentially describes a
PMF over values [0…w]. The PMF is a set of (value, probability)
pairs {(0, p
0
), (1, p
1
), …, (w, p
w
)}, denoting that the probability of
being is
, where
∑
≤1. When
∑
<1, with proba-
bility 1−
∑
, the value is greater than w (the exact values do
not matter in that case). For a (value, probability) pair (,
), we
write (,
)∈ if (,
) is an element of the PMF of d.
We now define a few operators over . Unless otherwise de-
fined, we let be a object that encodes a PMF {(,
), for
0≤≤}. For convenience of quick reference, we also summa-
rize these notations in Table 1 that follows.
We write ≽ if
∑
≥.
The operation ++d produces a object that has a PMF
{(+1,
), for 0≤≤−1}. That is, all values that are
less than w are increased by 1 and have the same probabili-
ties. If =⊥ (i.e., null), then ++=⊥.
279
剩余11页未读,继续阅读
myishh
- 粉丝: 0
- 资源: 6
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 批量文件重命名神器:HaoZipRename使用技巧
- 简洁注册登录界面设计与代码实现
- 掌握Python字符串处理与正则表达式技巧
- YOLOv5模块改进 - C3与RFAConv融合增强空间特征
- 基于EasyX的C语言打字小游戏开发教程
- 前端项目作业资源包:完整可复现的开发经验分享
- 三菱PLC与组态王实现加热炉温度智能控制
- 使用Go语言通过Consul实现Prometheus监控服务自动注册
- 深入解析Python进程与线程的并发机制
- 小波神经网络均衡算法:MATLAB仿真及信道模型对比
- PHP 8.3 中文版官方手册(CHM格式)
- SSM框架+Layuimini的酒店管理系统开发教程
- 基于SpringBoot和Vue的招聘平台完整设计与实现教程
- 移动商品推荐系统:APP设计与实现
- JAVA代码生成器:一站式后台系统快速搭建解决方案
- JSP驾校预约管理系统设计与SSM框架结合案例解析
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功