精确定位情节规则：事件序列挖掘新方法

需积分: 5 3 浏览量更新于2024-08-26 收藏 974KB PDF 举报

“从事件序列中挖掘精确定位的情节规则” 这篇研究论文探讨了在事件序列中挖掘精确定位情节规则的方法。传统的事件情节规则挖掘方法虽然能够发现先驱事件与后续事件之间的序列关联，但它们通常只提供一个给定时间间隔内的发生概率，而无法精确到每个事件的具体响应时间。这在时间敏感的应用中，如程序安全交易和智能交通管理等场景，就显得不足。为解决这个问题，研究者提出了“固定间隙情节”（fixed-gap episode）的概念。固定间隙情节是由一系列有序事件组成，其中任意两个连续事件之间的时间差是恒定的。这种概念强化了对事件序列中响应时间的精细度，使得事件的发生时间可以被明确地指定。基于固定间隙情节的概念，论文定义了一个新的问题——挖掘精确定位情节规则。在这个问题中，不仅要求找出先驱事件与后续事件的序列关联，还要明确每个后续事件的确切发生时间。为了有效地挖掘这些规则，研究者设计了一种基于字典（trie）的数据结构，并结合了几种剪枝策略。字典数据结构通常用于高效地存储和检索字符串，而在本研究中，它可能被用来快速定位和匹配事件序列模式。剪枝策略则有助于减少搜索空间，提高挖掘效率，避免不必要的计算。此外，论文可能还涉及了评估和验证这些规则的有效性，以及在实际应用中的性能分析。可能包括了实验设计，使用真实或模拟的事件序列数据集，对比传统方法与新方法在精度、召回率和运行时间等方面的表现。这篇研究论文为时间敏感应用提供了更精确的情节规则挖掘技术，通过固定间隙情节和优化的挖掘算法，提升了对事件序列中时间关系理解的精确度，从而有望在智能系统和监控等领域带来更好的预测和决策支持。

pattern, the time interval between two continuous items in a

pattern is also considered. It usually indicates minimum and

maximum time intervals between two adjacent items. We

may extend such methods to our problem by setting multiple

gap-values to discover all ﬁxed-gap patterns in the sequence.

But it is obvious that such an approach is time consuming due

to the exponential search space. Though [18] is able to dis-

cover rigid wild-card patterns with ﬁxed gap constrains in

biological sequences, it can only generate long patterns by

convoluting known elementary patterns. Our algorithms,

however, can discover all ﬁxed-gap episodes and valid pre-

cise-positioning episode rules without any prior knowledge.

Frequent Episode Mining. Mining frequent episodes from

event sequence was ﬁrst introduced by Mannila et al. [1]

where episodes are deﬁned as directed acyclic graphs and

two kinds of counting support are considered, i.e., sliding

windows and minimal occurrence. After that, various fre-

quency measures are deﬁned to discover different kinds of

episodes according to different applications, and minimal

occurrence is one of widely used measures [4], [7], [14], [20],

[21]. Mining general episodes can be intricate and computing-

intensive, for example, discovering whether a sequence cov-

ers a general episode is NP-hard [22]. Existing algorithms can

be categorized into two types, namely breadth-ﬁrst enumera-

tion [1], [20], [23] and depth-ﬁrst enumeration methods [4],

[7], [24]. Among them, the depth-ﬁrst enumeration methods

can be used to discover episode minimal occurrence. How-

ever, most of these algorithms require a post-processing step

to verify detected occurrences [9], which still have a signiﬁ-

cant space for improvement. On the other hand, researches

have been focused on mining subclasses of episodes, for

example, serial episodes [25], closed episodes [6], [22], maxi-

mal episodes [21], episodes with unique labels [26], [27].

Episode Rule Mining. Inchoate episode rules are considered

a “second-stage” output derived from frequent episodes [1],

[7]. Episode rules are usually represented in the form of a

time range in which the consequent will happen. Meger

et al. [13], [28] constructed episode rules with gap constraint

episodes and proposed the algorithm to ﬁnd the optimal win-

dow size. Fournier-Viger et al. [29] mined partially-ordered

sequential rules in which items are unordered in both the

antecedent and the consequent. Such kind of rules may

improve prediction accuracy in some applications. Lin

et al. [30] focused on the utility of episode rules and proposed

an algorithm to directly mine high utility episode rules. Our

work focuses on mining precise-positioning episode rules

motivated by critical applications in which we need to trigger

possible right responses at a more ﬁne-grained right time.

3PRELIMINARIES AND PROBLEM STATEMENT

In this section, we ﬁrst give some preliminary deﬁnitions

in frequent episode mining (Deﬁnitions 1, 2, 3, 4, and 5) [1],

[7], [9], [20]. Then, we propose some new concepts about

precise-positioning episode rules (Deﬁnitions 6, 7, 8, 9, 10,

11, and 12), and ﬁnally formulate the mining problem.

3.1 Preliminaries

Deﬁnition 1 (Event Sequence).

Let E be a ﬁnite set of events.

An event sequence, denoted

S ¼hðE

Þ; ðE

Þ; ...; ðE

Þi,

is an ordered sequence of events, where each E

6¼;and E

E

consists of all events associated with timestamp t

, and t

for any 1  j<k n.

For example, Fig. 1 shows an event sequence

S ¼ hðfDg; 1Þ; ðfA; Dg; 3Þ; ðfA; Bg; 4Þ; ðfEg; 5Þ; ðfB; D; Eg; 6Þ; ðfAg; 7Þ;

ðfBg; 8Þ; ðfE; Fg; 9Þ; ðfCg; 10Þ; ðfA; Fg; 11Þ; ðfFg; 12Þi

Deﬁnition 2 (Episode). An episode a is deﬁned as a non-

empty totally ordered set of events of the form he

; ...;

; ...;e

i where e

2Efor all i 2½1;k and the event e

occurs before the event e

for any 1  i<j k. An episode a

of length k is referred to a k-episode.

Deﬁnition 3 (Episode Occurrence). Given an episode a ¼

; ...;e

i and a sequence

S; ½t

; ...;t



is an occurrence of a if and only if (1) e

is an element

of the event set E

at time t

for all i 2½1;k; (2) t

<  <t

. The time window ½t

 is called an occur-

rence window of a. In this study, we only consider the episode

occurrences whose window size is smaller than a user-speciﬁed

threshold d, namely t

 t

< d. The set of all occurrences of

a in the sequence

S is denoted by ocSetðaÞ.

For example, if d ¼ 6 in Fig. 1, ocSetðhD; A; BiÞ¼f½1; 3; 4;

½3; 4; 6; ½3; 4; 8; ½3; 7; 8; ½6; 7; 8g.

Deﬁnition 4 (Minimal Episode Occurrence (MEO)).

Consider two time windows ½t

 and ½t

. ½t

 is sub-

sumed by ½t

 if t

 t

and t

 t

. An occurrence window

½t

 of an episode a is a minimal episode occurrence of a if

no other occurrence window ½t

 of a is subsumed by ½t

.

For example, moSetðhD; A; BiÞ¼f½1; 4; ½3; 6; ½6; 8g when

d ¼ 6 for the sequence in Fig. 1. The time window ½3; 8 con-

tains occurrences of hD; A; Bi, but it is not a minimal occur-

rence since hD; A; Bi also occurs in ½3; 6.

Deﬁnition 5 (Support of Episode). The support of an epi-

sode a, denoted as spðaÞ, is deﬁned as the number of its distinct

MEOs, i.e., spðaÞ¼jmoSetðaÞj. An episode is frequent if and

only if its support is not less than a user-speciﬁed parameter

min

sup.

For example, the episode hD; A; Bi is frequent when

min

sup ¼ 3 in Fig. 1.

3.2 Deﬁnitions and Problem Statement

Deﬁnition 6 (Fixed-Gap Episode).

A ﬁxed-gap episode

b is deﬁned as a tuple in the form ðhe

; ...;e

hDt

; ...; Dt

k1

iÞ where e

2Efor i 2½1;k and the

event e

occurs before the event e

for any 1  i<j k.

Additionally, the time span of the occurring time between event

jþ1

and event e

is Dt

;j2½1;k 1.

We denote a ﬁxed-gap

episode with length k as ﬁxed-gap k-episode.

For example, in Fig. 1, ðhE; Ai; h2iÞ is a ﬁxed-gap

2-episode. The time span between E and A is 2.

Deﬁnition 7 (Fixed-Gap Episode Occurrence (FEO)).

Given a ﬁxed-gap episode b ¼ðhe

; ...;e

i; hDt

; ...;

; ...; Dt

k1

iÞ; ½t

; ...;t

 is an occurrence (FEO)

of b if and only if (1) e

is an element of event set E

at time

for all i 2½1;k; (2) t

jþ1

and t

jþ1

 t

¼ Dt

for all

j 2½1;k 1 . Similar to Deﬁnition 3, t

and t

constitute an

occurrence window of b, which is denoted as ½t

.

1. A single event e is a kind of special ﬁxed-gap episode, and we

denote it as ðhei; nullÞ.

532 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 3, MARCH 2018

剩余13页未读，继续阅读

weixin_38674569

粉丝: 3

精确定位情节规则：事件序列挖掘新方法

Python-天池算法大赛商场中精确定位用户所在店铺第七名代码

一种基于序列模式的RFID数据挖掘算法.pdf

数据挖掘技术在供应链管理系统中的应用研究.pdf

序列模式算法能否挖掘出关联规则?

序列模式算法能否挖掘出关联规则？

序列模式挖掘 python

简述序列模式挖掘的一般步骤

如何使用prefixspan包中的PrefixSpan挖掘出频繁序列和关联规则

SPADE算法是如何通过Apriori特性和内存管理实现时间序列数据中的频繁序列模式挖掘的？

最新资源