在线概率序列的窗口子序列匹配

数据挖掘

需积分: 9 45 浏览量更新于2024-09-09 收藏 802KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

Note that the substring matching problem [1] is a special case of

subsequence matching where 



=



+1, for 1≤≤−1.

There is a well-known algorithm to solve DWSM with time com-

plexity O(l) per data item in the sequence, and space complexity

O(l) overall. The algorithm is as follows [1]:

for each ∈[1…] do 

[



]

←0 end for

for each ≥1 in increasing order do

if 

[



]

=[1] then 

[

]

← end if

for each ∈[2…] do

if 

[



]

=[] then 

[



]

←[−1] end if

end for

if −

[



]

< then report  end if

end for

In this algorithm, s[j] is the starting position in the sequence X of

he most recently seen minimal substring containing p[1]…p[j].

Thus, in the second to last line, if the window size −

[



]

<,

we report the index i as a match (i.e., it is the end of a match win-

dow). Our exact algorithm (Section 3) substantially modifies and

extends this algorithm for probabilistic sequences. Furthermore,

we devise a number of novel techniques – a provably accurate

randomized approximation algorithm, an adaptive filtering optimi-

zation, and algorithms that handle negations – to meet the real-

time continuous monitoring needs.

Definition 2 (DWSM with Negation). We extend the alphabet of

a pattern to be 



=⋃



, where ∈⟺ ∈



. Here,  is called

the negation of c. We denote the subsequence of p that consists of

all characters in  (called positive characters) as p

, and the sub-

sequence that consists of all characters in 



(called negative char-

acters) as p

−

. Moreover, let l

= |p

| and l

−

= |p

−

| (hence, l

+ l

−

= l). The DWSM with negation is the DWSM problem of pattern

over string X with the following constraints:

(1) For a negative character  (i.e., negation of ) in p that has

positive characters c

to the left and c

to the right, but no

other positive characters between c

and c

in p, there must

not be an occurrence of c between the matching position of

and that of c

in X. We say that  is a surrounded nega-

tion.

(2) For a negative character  that does not have a positive

character to the left in , it is required that the last charac-

ter of  must be positive (i.e., []∈). Let []’s match in

X be [



], and let the first positive character in  be 



Then the matching dictates that between index 



−+1

and the matching position of 



in X, there is no occurrence

of . We say that  is a front negation.

(3) Similarly, for a negative character  that does not have a

positive character to the right in , it is required that the

first character of  must be positive (i.e., [1]∈). Let

[1]’s match in X be [



], and let the last positive charac-

ter in  be 



. Then the matching dictates that between the

matching position of 



and index 



+−1 in X, there is

no occurrence of . We say that  is a rear negation.

Example 1. Consider the pattern  = 303



in Section 1 that

detects a too-long R-R interval. Let  =360. Then 



=300,





=3, 





, and 



=2. The first 3



is a surrounded negation

requiring no 3’s between two matching 0’s around it. The second



is a rear negation, requiring that, between the matching position

of the previous positive character in  (i.e., 0) and the end of the

window (i.e., distance  = 360 from the matching position of the

first 3), there is no occurrence of 3’s. Therefore, the pattern de-

tects no two R peaks within a large window of size 360, which

implies that the R-R interval is too long.

We are now ready to define the subsequence matching problem

over a probabilistic sequence, which is the focus of this work.

Definition 3 (PWSM). Let each [] in string  be an independ-

ent random variable that has a probability mass function (PMF)

over , for ≥1. The probabilistic windowed subsequence match-

ing (PWSM) problem (with or without negation) is the probabilis-

tic version of the corresponding DWSM (Definition 1 or 2) with an

extra parameter , called the probability threshold. An index posi-

tion  is reported if and only if the probability that it is reported

over all (deterministic) possible worlds based on DWSM is at least

.

Our focus in this work is efficient continuous online monitoring

PWSM queries that detect subsequence patterns. Thus, high

throughput and low memory consumption are of paramount signif-

icance.

3. AN EXACT ALGORITHM

We start with the PWSM problem without negations; negations

will be discussed in Section 6. For an index  in the sequence,

define a random variable [] to be the size of the minimal win-

dow to the left of  that contains  as a subsequence. In other

words, the window from index −

[



]

+1 to index  should

contain  as a subsequence, and is minimal. Then a straightfor-

ward approach for PWSM is to consider all possible worlds of a

sequence up to index  and calculate Pr([] ≤ ) by summing

the probabilities of all possible worlds in which []≤. The

index  is reported if Pr([] ≤ ), for each . However, this is

clearly infeasible due to the exponential number of possible

worlds.

It turns out that we can have a much more efficient algorithm that

takes time (∙) per data item and space (∙) overall. To

make the presentation of the algorithms more succinct, we start

with some notations.

3.1 Notations

We first define a data type called truncated window size distribu-

tion, denoted as . A  type object d essentially describes a

PMF over values [0…w]. The PMF is a set of (value, probability)

pairs {(0, p

), (1, p

), …, (w, p

)}, denoting that the probability of

being  is 



, where

∑









≤1. When

∑









<1, with proba-

bility 1−

∑









, the value is greater than w (the exact values do

not matter in that case). For a (value, probability) pair (,



), we

write (,



)∈ if (,



) is an element of the PMF of d.

We now define a few operators over . Unless otherwise de-

fined, we let  be a  object that encodes a PMF {(,



), for

0≤≤}. For convenience of quick reference, we also summa-

rize these notations in Table 1 that follows.

 We write ≽ if

∑









≥.

 The operation ++d produces a  object that has a PMF

{(+1, 



), for 0≤≤−1}. That is, all values that are

less than w are increased by 1 and have the same probabili-

ties. If =⊥ (i.e., null), then ++=⊥.

279

剩余11页未读，继续阅读

myishh

粉丝: 0
资源: 6

在线概率序列的窗口子序列匹配

Windowed Conic Matching Pursuit：将长时间序列投影到较低暗度的锥体上-matlab开发

伍世虔定长滑动窗模糊神经网络windowed_DFNN的MATLAB代码

dshow windowed mode

--disable-windowed-traceback

pyinstaller: error: ambiguous option: --d could match --debug, --disable-windowed-traceback, --distpath

for in track

pyinstaller --windowed your_script.py需要加-F吗

pyinstaller --name hello --onefile --windowed hello.py 生成的是exe不是dll

ofdm_modulation = reshape(windowed_time_wave_matrix', 1, IFFT_bin_length*(symbols_per_carrier+1));

python滑动窗口氨基酸序列

ofdm_modulation = reshape(windowed_time_wave_matrix', 1, IFFT_bin_length*(symbols_per_carrier+1)); 这是什么调制

ofdm_modulation = reshape(windowed_time_wave_matrix', 1, IFFT_bin_length*(symbols_per_carrier+1))什么意思

1362 INFO: PyInstaller: 5.10.1 1362 INFO: Python: 3.9.7 (conda) 1377 INFO: Platform: Windows-10-10.0.19045-SP0 option(s) not allowed: --onedir/--onefile --console/--nowindowed/--windowed/--noconsole makespec options not valid when a .spec file is given

TensorFlow中的循环神经网络（RNN）预测气温的例子源码

请用matlab写一个广义S变换的代码，并注释

verilog实现汉明窗函数代码

verilog 实现汉明窗代码

pyinstaller: error: ambiguous option: --w could match --windowed, --win-private-assemblies, --win-no-prefer-redirects, --workpath

频谱相关分析matlab程序

最新资源