978-1-4673-0024-7/10/$26.00 ©2012 IEEE 1162
2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012)
Mining Scalable Pattern Based on Temporal Logic
over Data Streams
∗
Yan Tang, Feifei Li, HongYan Li
†
Key Laboratory of Machine Perception (Peking University), Ministry of Education
School of Electronics Engineering and Computer Science, Peking University
Beijing 100871, P. R. China
{tangyan,liff,lihy}@cis.pku.edu.cn
Abstract—In many data stream applications, data segments
which are sequential and complicatedly changeable always imply
great domain specific value. Especially in the field of medical
survey, mining such sequential data segments will help making
diagnosis. We discovered that, based on extensive analysis,
although containing rich semantics, these data segments are
actually composed of some certain basic units, and these units
can form different kinds of complex patterns with duplication or
lack in certain positions considering a temporal logic. Therefore,
we present a scalable pattern mining method. With this method,
the Scalable Pattern Tree (SPTree) structure is designed to
support teh expression of scalable semantics and efficient
mining. At last, the experimental results on real datasets prove
that our method is feasible and efficient.
Keywords-Data Stream; Pattern Mining; Scalable Pattern; Tem-
poral Logic
I. INTRODUCTION
Along with the booming of the telecommunications and
network, applications based on data s tream have become much
more widely used. And mining out potential and valuable
knowledge from streams is a hotspot in the field of data
mining. In many data stream applications, data segments which
contain continuous data points always indicate much richer
information than a single data point, and the information is
valuable and changeable. And if we can mine out such data
segments from streams, we can make a difference in reality.
Take the bio-medical signals gathered from medical mon-
itoring as an example (e.g., electrocardiograph (ECG)). The
data which we gather from ECG-monitoring systems are
consecutive and discrete data points which constitute the
heart-wave activities and have complicated variation implying
various messages. Fig. 1 shows four ECG waves from different
people. Every ECG period contains different waves which
show different phases of heart beating. A health person’s ECG,
†
Corresponding author.
∗
This work was supported by Natural Science Foundation of China
(No.60973002 and No.61170003), the National High Technology Research
and Development Program of China (Grant No. 2012AA011002), National
Science and Technology Major Program (Grant No. 2010ZX01042-002-
002-02, 2010ZX01042-001-003-05), National Science & Technology Pillar
Program (Grant No. 2009BAH44B03), the Cultivation Fund of the Key
Scientific and Technical Innovation Project, Ministry of Education of China
(Grant No. 708001) and the Shenzhen-Hong Kong Innovation Cooperation
Project (No. JSE201007160004A).
(D)
(A)
(C)
(B)
PQ
R
S
T
U
QRS
QRSQRS QRS
P
T
U
TTT
Fig. 1. ECG waves, each show a condition of some potential disease.
called Standard Cycles, is displayed in Fig. 1(A). The rest
three are ECG waves gathered from three different patients.
T-wave occurs more than once in Fig. 1(B), meaning that
Acute Renal Failure (ARF) is around the corner. QRS-wave
is missing in Fig. 1(C) indicating deadly arrhythmia which
should be found out in time and the patient should be given
cardiac massage, mouth-to-mouth resuscitation or ventricular
pacing to be rescued. The last waves in Fig. 1(D) with QRS-
wave appearing four times, indicates ventricular arrhythmias
with an irregular rhythm.
In fact, when the ECG wave of a patient shows differences
from the standard one, it means there may be some diseases
around the corner. If we can mine out these abnormal waves
from ECG streams and give an alarm in time, we can make
a contribution in helping give appropriate and timely treat-
ment. However, how can we mine out these complex waves?
Through plenty of analysis in ECG data, we have discovered
that:
∙ These data segments have very complex variation, but
the elements are constant, i.e., P, Q, R, S, T and U are
the six elements consisting of a ECG period (details show
in Fig. 1(A)).
∙ The elements occur with temporal logic and there may
be duplication and lack (We call this the scalability of
the ECG pattern.) of these elements The six elements
show in a ECG period with the order of P, Q, R, S, T, U
(We call this temporal logic of the ECG pattern). T-wave
occurring repeatedly in Fig. 1(B) means a duplication
of the ECG pattern and QRS-wave missing in Fig. 1(C)
indicates a lack of the ECG pattern.
∙ The scalability has Stratified semantics, i.e., except for