可伸缩模式挖掘：基于时间逻辑的数据流分析

PDF格式 | 308KB | 更新于2024-08-26 | 112 浏览量 | 举报

"在数据流上基于时间逻辑的可伸缩模式挖掘" 本文主要探讨了在数据流应用中，如何挖掘具有时间和顺序特征的复杂模式，特别是在医学调查领域中的重要性。作者Yan Tang、Feifei Li和Hong Yan Li提出了一个可扩展的模式挖掘方法，旨在处理具有丰富语义但由基础单元组成的复杂数据序列，并考虑了时间逻辑因素。在数据流分析中，连续且复杂变化的数据段经常蕴含着特定领域的关键信息。例如，在医学研究中，这种序列数据可能包含病人的生理指标变化，这些变化可以揭示疾病的模式和发展。通过对这些数据段进行深入挖掘，可以帮助医生进行更准确的诊断和预测。尽管这些数据段含有丰富的语义信息，但研究发现它们通常由一些基本单元构成，这些单元可以通过不同的组合、重复或在特定时间位置的缺失形成各种复杂模式。考虑到时间逻辑，即数据在时间轴上的顺序和相对关系，某些位置可能存在或缺失，这为模式挖掘增加了挑战。为了解决这一问题，文章提出了一种名为“可伸缩模式树”(Scalable Pattern Tree, SPTree)的结构。SPTree设计用于表达具有可伸缩语义的模式，并有效地进行挖掘。通过构建SPTree，可以捕捉数据流中的模式变化，同时考虑时间因素，确保挖掘出的模式既具有时间相关性又具备领域相关性。实验结果表明，SPTree方法在实际数据集上表现出了良好的可行性和有效性。这种方法的引入，不仅提高了模式挖掘的效率，还增强了在动态数据环境中的适应性，特别是在处理大规模数据流时，能够有效地发现潜在的有价值模式。这项工作为数据流分析提供了一种新的视角，特别是对于那些依赖于时间序列分析的领域，如医学研究、金融市场预测或物联网设备产生的实时数据处理，都有着重要的理论与实践意义。通过SPTree，研究人员和从业者能够更深入地理解复杂数据流中的模式，从而做出更明智的决策。

2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012)

Mining Scalable Pattern Based on Temporal Logic

over Data Streams

∗

Yan Tang, Feifei Li, HongYan Li

†

Key Laboratory of Machine Perception (Peking University), Ministry of Education

School of Electronics Engineering and Computer Science, Peking University

Beijing 100871, P. R. China

{tangyan,liff,lihy}@cis.pku.edu.cn

Abstract—In many data stream applications, data segments

which are sequential and complicatedly changeable always imply

great domain speciﬁc value. Especially in the ﬁeld of medical

survey, mining such sequential data segments will help making

diagnosis. We discovered that, based on extensive analysis,

although containing rich semantics, these data segments are

actually composed of some certain basic units, and these units

can form different kinds of complex patterns with duplication or

lack in certain positions considering a temporal logic. Therefore,

we present a scalable pattern mining method. With this method,

the Scalable Pattern Tree (SPTree) structure is designed to

support teh expression of scalable semantics and efﬁcient

mining. At last, the experimental results on real datasets prove

that our method is feasible and efﬁcient.

Keywords-Data Stream; Pattern Mining; Scalable Pattern; Tem-

poral Logic

I. INTRODUCTION

Along with the booming of the telecommunications and

network, applications based on data s tream have become much

more widely used. And mining out potential and valuable

knowledge from streams is a hotspot in the ﬁeld of data

mining. In many data stream applications, data segments which

contain continuous data points always indicate much richer

information than a single data point, and the information is

valuable and changeable. And if we can mine out such data

segments from streams, we can make a difference in reality.

Take the bio-medical signals gathered from medical mon-

itoring as an example (e.g., electrocardiograph (ECG)). The

data which we gather from ECG-monitoring systems are

consecutive and discrete data points which constitute the

heart-wave activities and have complicated variation implying

various messages. Fig. 1 shows four ECG waves from different

people. Every ECG period contains different waves which

show different phases of heart beating. A health person’s ECG,

†

Corresponding author.

∗

This work was supported by Natural Science Foundation of China

(No.60973002 and No.61170003), the National High Technology Research

and Development Program of China (Grant No. 2012AA011002), National

Science and Technology Major Program (Grant No. 2010ZX01042-002-

002-02, 2010ZX01042-001-003-05), National Science & Technology Pillar

Program (Grant No. 2009BAH44B03), the Cultivation Fund of the Key

Scientiﬁc and Technical Innovation Project, Ministry of Education of China

(Grant No. 708001) and the Shenzhen-Hong Kong Innovation Cooperation

Project (No. JSE201007160004A).

(D)

(A)

(C)

(B)

QRS

QRSQRS QRS

TTT

Fig. 1. ECG waves, each show a condition of some potential disease.

called Standard Cycles, is displayed in Fig. 1(A). The rest

three are ECG waves gathered from three different patients.

T-wave occurs more than once in Fig. 1(B), meaning that

Acute Renal Failure (ARF) is around the corner. QRS-wave

is missing in Fig. 1(C) indicating deadly arrhythmia which

should be found out in time and the patient should be given

cardiac massage, mouth-to-mouth resuscitation or ventricular

pacing to be rescued. The last waves in Fig. 1(D) with QRS-

wave appearing four times, indicates ventricular arrhythmias

with an irregular rhythm.

In fact, when the ECG wave of a patient shows differences

from the standard one, it means there may be some diseases

around the corner. If we can mine out these abnormal waves

from ECG streams and give an alarm in time, we can make

a contribution in helping give appropriate and timely treat-

ment. However, how can we mine out these complex waves?

Through plenty of analysis in ECG data, we have discovered

that:

∙ These data segments have very complex variation, but

the elements are constant, i.e., P, Q, R, S, T and U are

the six elements consisting of a ECG period (details show

in Fig. 1(A)).

∙ The elements occur with temporal logic and there may

be duplication and lack (We call this the scalability of

the ECG pattern.) of these elements The six elements

show in a ECG period with the order of P, Q, R, S, T, U

(We call this temporal logic of the ECG pattern). T-wave

occurring repeatedly in Fig. 1(B) means a duplication

of the ECG pattern and QRS-wave missing in Fig. 1(C)

indicates a lack of the ECG pattern.

∙ The scalability has Stratiﬁed semantics, i.e., except for

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38746166

粉丝: 8

可伸缩模式挖掘：基于时间逻辑的数据流分析

基于web的计算机数据挖掘系统设计研究 (1).pdf

基于分布式流计算的路网指挥中心系统数据处理技术.pdf

基于遗传算法的中药药对挖掘系统设计与Java实现

实时数据集成新时代：基于CDC的数据流技术革命

【航空订票系统后端逻辑】：数据流分析与优化策略

【深度学习模式识别】：高级模式识别在数据挖掘中的技术突破

深入了解Unity ECS的数据流与数据流执行

队列在流处理中的应用：实现流数据的实时处理和分析，挖掘数据价值

使用并行数据流加速数据处理流程

数据产品中的数据流处理与实时计算技术

最新资源