The search algorithm leverages on two steps: dimension-
ality reduction [8], [9], [10], [1], [3], [11] and data
representation in the transformed space.
Various dimensionality reduction techniques have been
proposed for time series data transformation. These
includes: Discrete Fourier Transform (DFT), Singular Value
Decomposition (SVD) [12], [8] , [13], Disc rete Wavelet
Transform (DWT) [9], [10], and Piec ewise Aggregate
Approximation (PAA) [8]. Another approach for dimen-
sionality reduction is to make use of time series segmenta-
tion [1], [3], [11].
Besides dimensionality reduction, the choice of time
series representation is also important. Two types of time
series representations are commonly used: numeric and
symbolic. One of the commonly used numeric representa-
tion is the real sequence [6]. Symbolic representation is
generated by a symbol table that reflects each data vector of
sequence into symbols [5]. The symbol table can be either
predefined or built from data sets.
Trajectory data can be considered as a specific form of
time series, which has been applied in moving object search
or video retrieval fields [3], [4], [11], [14], [6]. In these works,
the trajectories first are segmented by their control points
[4], [11], [14], or inflextions [3], and then different
representation methods, i.e., either a scalable numeric
representation in [14], [6] or symbolic representations in
[3], [4], [11], [5] have been adopted for similarity measure.
2.2 Time Series Mining
Multiple time series mining has also been recently explored.
Existing multiple time series research have focused on
pattern mining and finding correlation between multiple
time series, over patterns and observed values from group
of individual time series. Papadimitriou et al. [15] proposed
the SPIRIT system. SPIRIT performs incremental Principal
Component Analysis (PCA) over stream data, and delivers
results in real time. SPIRIT discovers the hidden variables
among n input streams and automatically determines the
number of hidden variables that will be used. The observed
values of the hidden variables present the general pattern of
multiple input series according to their distributions and
correlations. BRAID [16] addressed the problem of dis-
covering lag correlations between data steams. BRAID
focuses on a time and space efficient method for finding the
earliest and highest peak in the cross-correlation functions
between all pairs of streams.
Another closely related area of work is research on
computing the group nearest neighbor query [17]. The
query of Group KNN is a set of high-dimensional data
points. The group nearest neighbor query returns a group of
data points in the database which are similar to the set of
query based on the patterns and relations. The data set used
in the Group KNN problem consists of independent points,
and does not consider whether natural relations exist
between these points.
These existing approaches cannot be easily extended for
the TSC similarity search problem because of the lack of a
clearly defined similarity measure. Most importantly,
existing approaches are unable to deal with the natural
relations that exist between the multiple time series.
2.3 Representation of Time Series Relations
Several approaches for capturing natural relations among
multiple time series have been proposed. Allen [18] defined
13 interval relations that exist between multiple time series.
These include: before, overlaps, during, etc. These relations
are used to describe the relative position of two intervals.
Fleischamn et al. [19] deployed Allen’s descriptor to capture
temporal information by representing the events in video
data based on a lexicon of hierarchical patterns of move-
ments. An improved interval relation description method
for local temporal relations, i.e., Time Series Knowledge
Representation (TSKR) was proposed by Mo
¨
rchen and
Ultsch in [20]. TSKR expresses temporal knowledge in time
series data. In addition, the temporal relation, 3D Z-String
[21] was used to represent moving objects’ spatiotemporal
relations. In 3D Z-string, the objects in a video are projected
onto the x-, y-, and time-axis to form three strings
representing the relations and relative positions. The
temporal overlapping and spatial position are well defined
in 3D Z-String.
In summary, none of the above methods can address the
challenges of effective and efficient TSC similarity search.
The existing methods lack a compact, powerful, and
measurable representation for the patterns and the natural
relations that are inherent in the TSCs. In addition, it is hard
to identify a generic relation descriptor that can be applied
to various application domains.
3BRUTE-FORCE TSC MATCHING
In this section, we first present a straightforward approach
for solving the TSC matching problem, which exhaustively
compares the time series in TSCs. In such a Brute-Force
approach for TSC similarity matching, all the possible
matches between two TSCs are identified. Once the possible
matches are found, the similarity between all the time series
in each matching result are computed. The maximum
similarity (i.e., minimum distance) is then used to represent
the similarity between two TSCs.
Let us revisit the example shown in Fig. 1. We can
observe that there are two similar TSCs, i.e., TSC
1
(Figs. 1a
and 1b) and TSC
2
(Figs. 1c and 1d). These TSCs are
extracted from a computer simulation of a hockey ball game
[22]. Specifically, Figs. 1a and 1c are the trajectories of
players in the TSC
1
and TSC
2
, Figs. 1b and 1d are the time
intervals of respective trajectories in the TSCs.
To measure the similarity between the above TSCs, i.e.,
TSC
1
and TSC
2
, the Brute-Force approach is to find all the
corresponding players in TSC
2
for all the players in the
TSC
1
, which are the players whose trajectories and
intervals in Figs. 1b and 1d are of the same color as the
players’ in Figs. 1a and 1c. After finding the matched
players, we can measure the similarity of each pair by
distance function between trajectories, and calculate the
sum of their distance as the value of distðTSC
1
;TSC
2
Þ.
The above approach can be easily generalized to find
the matching result between the time series in TSC
1
and
TSC
2
for different application do mains. A distance
function DISðT
i
;T
j
Þ, where T
i
2 TSC
1
and T
j
2 TSC
2
,
can be used for comparing the similarities between two
CUI ET AL.: A FRAMEWORK FOR SIMILARITY SEARCH OF TIME SERIES CLIQUES WITH NATURAL RELATIONS 387