自然关系时间序列群组的高效相似搜索框架

研究论文

109 浏览量更新于2024-08-27 收藏 1.43MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

本文主要探讨的是"具有自然关系的时间序列群体相似性搜索框架"这一主题。作者Bin Cui、Zhe Zhao和Wee Hyong Tok作为IEEE的资深会员，针对时间序列数据中的一个重要问题提出了创新性的解决方案。在传统的时序数据检索中，通常忽略了数据之间的自然关系，例如在视频轨迹中，不同时间序列之间的空间关联。然而，这种内在关联对于理解数据模式和发现潜在的相关性至关重要。该研究的核心贡献是构建了一个全新的框架，用于高效地在包含自然关系的时间序列群组（Time Series Cliques, TSC）数据库中执行相似性搜索。首先，框架提供了一种紧凑的数据表示方法，使得大量TSC数据能够以更小的存储空间进行管理。这不仅节省了存储资源，还提高了查询效率。其次，引入了多维关系向量来捕捉TSC内多个时间序列间的复杂自然关系。通过这种方式，不仅可以捕获静态的关系特性，还能考虑到动态变化的趋势和关联，增强了对数据内在结构的理解。最后，框架提出了一种新颖的相似度度量方法，结合了紧凑的表示形式和关系向量。这种度量方法旨在全面评估两个TSC之间的相似性，不仅仅局限于单个时间序列的特征匹配，而是考虑了整体结构和关系的一致性。为了验证框架的有效性和性能，研究者使用了真实的和合成的数据集进行了广泛深入的性能评估。实验结果表明，与传统方法相比，该框架在处理具有自然关系的时间序列相似性搜索任务时表现出显著的优势，包括更快的查询速度和更高的准确性。总结来说，这篇研究论文为处理具有自然关系的时间序列数据提供了强大的工具，对于时序数据分析、模式识别以及大规模数据挖掘等领域具有重要的理论价值和实际应用潜力。它将有助于改善现有技术，使得在处理复杂时间序列数据时能更好地捕捉和利用其中的内在联系。

资源详情

资源推荐

The search algorithm leverages on two steps: dimension-

ality reduction [8], [9], [10], [1], [3], [11] and data

representation in the transformed space.

Various dimensionality reduction techniques have been

proposed for time series data transformation. These

includes: Discrete Fourier Transform (DFT), Singular Value

Decomposition (SVD) [12], [8] , [13], Disc rete Wavelet

Transform (DWT) [9], [10], and Piec ewise Aggregate

Approximation (PAA) [8]. Another approach for dimen-

sionality reduction is to make use of time series segmenta-

tion [1], [3], [11].

Besides dimensionality reduction, the choice of time

series representation is also important. Two types of time

series representations are commonly used: numeric and

symbolic. One of the commonly used numeric representa-

tion is the real sequence [6]. Symbolic representation is

generated by a symbol table that reflects each data vector of

sequence into symbols [5]. The symbol table can be either

predefined or built from data sets.

Trajectory data can be considered as a specific form of

time series, which has been applied in moving object search

or video retrieval fields [3], [4], [11], [14], [6]. In these works,

the trajectories first are segmented by their control points

[4], [11], [14], or inflextions [3], and then different

representation methods, i.e., either a scalable numeric

representation in [14], [6] or symbolic representations in

[3], [4], [11], [5] have been adopted for similarity measure.

2.2 Time Series Mining

Multiple time series mining has also been recently explored.

Existing multiple time series research have focused on

pattern mining and finding correlation between multiple

time series, over patterns and observed values from group

of individual time series. Papadimitriou et al. [15] proposed

the SPIRIT system. SPIRIT performs incremental Principal

Component Analysis (PCA) over stream data, and delivers

results in real time. SPIRIT discovers the hidden variables

among n input streams and automatically determines the

number of hidden variables that will be used. The observed

values of the hidden variables present the general pattern of

multiple input series according to their distributions and

correlations. BRAID [16] addressed the problem of dis-

covering lag correlations between data steams. BRAID

focuses on a time and space efficient method for finding the

earliest and highest peak in the cross-correlation functions

between all pairs of streams.

Another closely related area of work is research on

computing the group nearest neighbor query [17]. The

query of Group KNN is a set of high-dimensional data

points. The group nearest neighbor query returns a group of

data points in the database which are similar to the set of

query based on the patterns and relations. The data set used

in the Group KNN problem consists of independent points,

and does not consider whether natural relations exist

between these points.

These existing approaches cannot be easily extended for

the TSC similarity search problem because of the lack of a

clearly defined similarity measure. Most importantly,

existing approaches are unable to deal with the natural

relations that exist between the multiple time series.

2.3 Representation of Time Series Relations

Several approaches for capturing natural relations among

multiple time series have been proposed. Allen [18] defined

13 interval relations that exist between multiple time series.

These include: before, overlaps, during, etc. These relations

are used to describe the relative position of two intervals.

Fleischamn et al. [19] deployed Allen’s descriptor to capture

temporal information by representing the events in video

data based on a lexicon of hierarchical patterns of move-

ments. An improved interval relation description method

for local temporal relations, i.e., Time Series Knowledge

Representation (TSKR) was proposed by Mo

rchen and

Ultsch in [20]. TSKR expresses temporal knowledge in time

series data. In addition, the temporal relation, 3D Z-String

[21] was used to represent moving objects’ spatiotemporal

relations. In 3D Z-string, the objects in a video are projected

onto the x-, y-, and time-axis to form three strings

representing the relations and relative positions. The

temporal overlapping and spatial position are well defined

in 3D Z-String.

In summary, none of the above methods can address the

challenges of effective and efficient TSC similarity search.

The existing methods lack a compact, powerful, and

measurable representation for the patterns and the natural

relations that are inherent in the TSCs. In addition, it is hard

to identify a generic relation descriptor that can be applied

to various application domains.

3BRUTE-FORCE TSC MATCHING

In this section, we first present a straightforward approach

for solving the TSC matching problem, which exhaustively

compares the time series in TSCs. In such a Brute-Force

approach for TSC similarity matching, all the possible

matches between two TSCs are identified. Once the possible

matches are found, the similarity between all the time series

in each matching result are computed. The maximum

similarity (i.e., minimum distance) is then used to represent

the similarity between two TSCs.

Let us revisit the example shown in Fig. 1. We can

observe that there are two similar TSCs, i.e., TSC

(Figs. 1a

and 1b) and TSC

(Figs. 1c and 1d). These TSCs are

extracted from a computer simulation of a hockey ball game

[22]. Specifically, Figs. 1a and 1c are the trajectories of

players in the TSC

and TSC

, Figs. 1b and 1d are the time

intervals of respective trajectories in the TSCs.

To measure the similarity between the above TSCs, i.e.,

TSC

and TSC

, the Brute-Force approach is to find all the

corresponding players in TSC

for all the players in the

TSC

, which are the players whose trajectories and

intervals in Figs. 1b and 1d are of the same color as the

players’ in Figs. 1a and 1c. After finding the matched

players, we can measure the similarity of each pair by

distance function between trajectories, and calculate the

sum of their distance as the value of distðTSC

;TSC

Þ.

The above approach can be easily generalized to find

the matching result between the time series in TSC

and

TSC

for different application do mains. A distance

function DISðT

Þ, where T

2 TSC

and T

2 TSC

can be used for comparing the similarities between two

CUI ET AL.: A FRAMEWORK FOR SIMILARITY SEARCH OF TIME SERIES CLIQUES WITH NATURAL RELATIONS 387

剩余13页未读，继续阅读

weixin_38732315

粉丝: 7
资源: 963

自然关系时间序列群组的高效相似搜索框架

人工智能-数据分析-基于多维时间序列的甲亢疾病数据分析.pdf

SIMPRIM:用于设计适当的相似性度量和针对客户旅程进行聚类的框架

基于趋势的时间序列相似性度量和聚类研究.pdf

python计算时间序列相似性

分析时间序列相似性的方法有哪些？

时间序列的符号化相似性及其度量

文献[6]提出一种基于DTW距离度量的Kmedoids的时间序列数据异常检测算法，通过引入时间动态调整，提高了相似性度量的精度，解决了不定长时间序列对齐问题，具有较强的鲁棒性。

周转时间时间序列具有什么特性

哪些时间序列数据之间具有协整关系

分形布朗时间序列是平稳时间序列吗

dtw 具有较好的弹性和鲁棒性,在时间序列分类中得到广泛应用。然而当时间 序列的形

时间序列分析平稳性检验

时间序列预测模型有哪些

Transformer框架的时间序列算法

能给我一些三个时间序列数据之间具有协整关系的例子吗

时间序列分析稳定性判断

时间序列的相似度说明了什么

如何度量时间序列可预测性 spark

检验时间序列平稳性原理

什么是时间序列的平稳性

最新资源

dtw 具有较好的弹性和鲁棒性,在时间序列分类中得到广泛应用。然而当时间序列的形