A Hybrid Time Series Matching Algorithm Based on Feature-Points and DTW
Xi Wang
∗
, Mingxing Jiang
†
, Sheng Chen
‡
, Chao Yang
§
, Wei Jing
∗
and Zhongwen Guo
∗
∗
Department of Computer Science and Technology, Ocean University of China, Qingdao, China
Email: guozhw@ouc.edu.cn
†
Department of Computer Foundation, Ocean University of China, Qingdao, China
Email: jiangmx@ouc.edu.cn
‡
University of Washington, Seattle, WA, USA
Email: shengc5@uw.edu
§
College of Communication Engineering, Chongqing University, Chongqing, China
Email: yang249@163.com
Abstract—Feature-points based time series approximation
representation utilizes the tendency information, but lacks
consideration of the details. DTW (Dynamic Time Warping)
based similarity measurement eliminates the time line warp,
but computation complexity is high. Based on feature-points
and DTW, this paper proposed a hybrid time series matching
algorithm. Firstly, we extracted the feature-points of time series
as the coarse-grained representation, calculated the DTW dis-
tance between feature-points; then applied uniform sampling
on feature-points segmentations as the fine-grained representa-
tion, calculated the Euclidean distance between corresponding
segmentations; at last, we summed the two distances as the final
distance. The algorithm achieved a high matching accuracy
while lowered the computation overhead. This paper used
several time series data sets from UCR to do the experiments,
verified the effectiveness of the proposed algorithm.
Keywords-Time Series; DTW; Feature Points; Hybrid Match-
ing
I. INTRODUCTION
A time series is a series of data points listed in time
order, for example curves of temperature and stock price.
Time series are applied in many areas such as financial
[1], sensor networks [2] and energy industry [3]. As the
development of social economy and technologies in IOT and
big data, data volume of time series increases rapidly and
time series mining is gaining more and more attentions from
researchers. Time series matching is the basis of time series
mining and becomes one of the most studied areas in the
time series mining literature [12].
Time series matching includes approximation representa-
tion and similarity measure [7]. Approximation representa-
tion is to extract features by some means and the major
reason for approximation representation is to reduce the
dimension of the original data. Common methods for repre-
sentation are piecewise linear representation [4], frequency
domain representation [5], symbolic representation [6], 𝑒𝑡𝑐.
Among them piecewise linear representation performs better
in approximation with the original data by separating a
time series into several parts which are expressed by linear
formulation.
Based on the main idea in piecewise linear representation,
some feature-points based methods are proposed [8] [9].
These approaches utilize the feature-points as the cut-off
points and make further approximation with the original
data. Piecewise linear representation reflects the overall trend
but in the meantime some details are missing because of the
sampling.
For the similarity measure, the most commonly used
measures are Euclidean distance and dynamic time warping
(DTW) distance [10]. Euclidean distance is relatively simple
but the time line shifting problem is not considered. DTW
solves the shifting problem but dynamic programming is
adopted which leads to a high computation complexity.
This paper is organized as follows. Firstly, we present
problem statement in section II. The approximation repre-
sentation will be presented in section III. In section IV, we
give the similarity measure used in this paper. In section
V, we propose our hybrid matching algorithm. Section VI
investigates the effectiveness of the proposed algorithm.
Finally, we draw some conclusions in section VII.
II. P
ROBLEM STATEMENT
Definition 1 (Time Series): Time series 𝑇 is a set of
observation data listed in time order.
𝑇 = {(𝑝
1
,𝑡
1
), (𝑝
2
,𝑡
2
), ..., (𝑝
𝑖
,𝑡
𝑖
)} (𝑡
1
<𝑡2 < ... < 𝑡
𝑖
)
(1)
Where, 𝑝
𝑖
is the 𝑖𝑡ℎ data point, 𝑡
𝑖
is the corresponding
time stamp, 𝑖 =1, 2, ..., 𝑛, 𝑛 is called the length of time
series 𝑇 . Fig. 1 shows an example of temperature time series.
Definition 2 (Time Series Matching): Given two time se-
ries 𝑇
1
and 𝑇
2
, 𝑇
′
1
and 𝑇
′
2
are the approximation represen-
tation, distance 𝐷 is computed by some similarity measure,
if 𝐷<𝜖(𝜖 is a predefined distance threshold), then we
consider that 𝑇
1
and 𝑇
2
are matched.
III. A
PPROXIMATION REPRESENTATION
The nature of time series data includes: large in data
size, high dimensionality. If matching is carried out on the
2016 9th International Symposium on Computational Intelligence and Design
2473-3547/16 $31.00 © 2016 IEEE
DOI 10.1109/ISCID.2016.153
171