KV-match：归一化与时间扭曲子序列匹配的新方法

126 浏览量更新于2024-08-26 收藏 741KB PDF 举报

"KV-match：一种支持归一化和时间扭曲的子序列匹配方法，该方法是一种研究论文，旨在解决在大数据量时间序列数据背景下，如何有效地进行子序列匹配的问题。KV-match不仅考虑了原始的子序列匹配（RSM），还支持子序列归一化，并引入了约束归一化子序列匹配问题（cNSM），允许用户灵活控制偏移位移和幅度缩放的程度，以适应不同的查询需求。" 时间序列数据在当今的信息时代中变得越来越庞大，特别是在数据中心管理和物联网应用中。传统的子序列匹配方法通常只关注原始子序列匹配，即不考虑数据的尺度变化或时间扭曲。然而，这种做法在处理真实世界的数据时可能会遇到问题，因为实际数据往往受到噪声、采样率不同或测量误差的影响，导致简单的匹配方法效果不佳。 UCR Suite是一个著名的时间序列数据库，它可以处理归一化子序列匹配问题（NSM），但其需要扫描整个时间序列，效率较低。为了克服这些限制，研究人员提出了KV-match，这是一种新的子序列匹配方法，它不仅支持子序列的归一化处理，还引入了时间扭曲的概念，允许在匹配过程中有一定的位移容忍度。在KV-match中，提出的约束归一化子序列匹配问题（cNSM）是NSM的一个扩展，它添加了某些约束条件，使得用户可以根据具体需求调整匹配过程中的偏移位移和幅度缩放程度。这种方法的优点在于，用户可以构建一个索引来高效处理查询，而无需遍历全部数据。通过这种方式，KV-match可以在保持匹配准确性的同时提高处理速度，尤其适用于大规模时间序列数据的场景。为了实现这一目标，论文中可能详细介绍了KV-match的索引结构和算法设计，包括如何存储和检索归一化后的子序列，以及如何处理时间扭曲来优化匹配效率。此外，可能还包含了实验部分，对比了KV-match与其他现有方法在性能和准确性上的表现，进一步证明了其优越性。 KV-match是一项创新性的技术，它在处理时间序列数据的子序列匹配任务时，兼顾了归一化和时间扭曲两个重要因素，为大数据环境下的时间序列分析提供了一个有效且灵活的解决方案。

TABLE I

REQUENTLY USED NOTATIONS

Notation Description

X a time series (x

, ··· ,x

)

X(i, l) a length-l subsequence of X starting at offset i

X the normalized series of time series X

the i

length-w disjoint window of X

the mean value of the i

disjoint window of X

the standard deviation of the i

disjoint window of X

WI a window interval containing continuous window positions

a set of window intervals satisfying the criterion for Q

, CS a set of candidates for Q

and for all Q

(1 ≤ j ≤ i)

the number of window intervals and window positions

works in Section VIII. Finally, we conclude the paper and

look into the future work in Section IX.

II. P

RELIMINARY KNOWLEDGE

In this section, we introduce the deﬁnition of time series

and other useful notations.

A. Deﬁnitions and Problem Statement

A time series is a sequence of ordered values, denoted as

X =(x

, ··· ,x

), where n = |X| is the length of X.A

length-l subsequence of X is a shorter time series, denoted as

X(i, l)=(x

i+1

, ··· ,x

i+l−1

), where 1 ≤ i ≤ n − l +1.

For any subsequence S =(s

, ··· ,s

), μ

and σ

are

the mean value and standard deviation of S respectively. Thus

the normalized series of S, denoted as

S,is

S =



− μ

, ···,

− μ



Our work supports two common distance measures, Eu-

clidean distance and Dynamic Time Warping. Here we give

the deﬁnition of them.

Euclidean Distance (ED): Given two length-m sequences,

S and S



, their distance is ED(S, S







i=1

− s



)

Dynamic Time Warping (DTW): Given two length-m se-

quences, S and S



, their distance is

DTW(, )=0; DTW(S, )=DTW(,S



)=∞;

DTW(S, S









− s



)

+ min

⎧

⎪

⎨

⎪

⎩

DTW(suf(S), suf(S



))

DTW(S, suf(S



))

DTW(suf(S),S



)

where  represents empty series and suf(S)=(s

, ··· ,s

)

is a sufﬁx subsequence of S.

In DTW, the warping path is deﬁned as a matrix to represent

the optimal alignment for two series. The matrix element (i, j)

represents that s

is aligned to s



. To reduce the computation

complexity, we use the Sakoe-Chiba band [10] to restrict the

width of warping, denoted as ρ. Any pair (i, j) should satisfy

|i − j|≤ρ. When ρ =0, it degenerates into ED.

We aim to support subsequence matching for both the raw

subsequence and the normalized subsequence simultaneously.

The problem statements are given here.

Raw Subsequence Matching (RSM): Given a long time

series X, a query sequence Q (|X|≥|Q|) and a distance

threshold ε (ε ≥ 0), ﬁnd all subsequences S of length |Q|

from X, which satisfy D



S, Q



≤ ε. In this case, we call that

S and Q are in ε-match.

Normalized Subsequence Matching (NSM): Given a long

time series X, a query sequence Q and a distance threshold

ε (ε ≥ 0), ﬁnd all subsequences S of length |Q| from X,

which satisfy D





≤ ε, where

S and

Q are

the normalized

series of S and Q respectively.

The cNSM problem adds two constraints to the NSM

problem. Thresholds α (α ≥ 1) and β (β ≥ 0) are introduced

to constrain the degree of amplitude scaling and offset shifting.

Constrained Normalized Subsequence Matching (cNSM):

Given a long time series X, a query sequence Q, a distance

threshold ε, and the constraint thresholds α and β, ﬁnd all

subsequences S of length |Q| from X, which satisfy





≤ ε ∩

≤

≤ α ∩−β ≤ μ

− μ

≤ β.

The larger α and β, the looser the constraint. In this case, we

call that S and Q are in (ε, α, β)-match.

The distance D(·, ·) is either ED or DTW. In this paper,

we build an index to support four types of queries, RSM-ED,

RSM-DTW, cNSM-ED and cNSM-DTW simultaneously.

III. T

HEORETICAL FOUNDATION AND

APPROACH MOTIVATION

In this section, we establish the theoretical foundation of

our approach. We propose a condition to ﬁlter the unqualiﬁed

subsequences. For all four types of queries, the conditions

share the same format, which enables us to support all query

types with a single index.

Speciﬁcally, for the query Q and the subsequence S of

length-m, we segment them into aligned disjoint windows

of the same length w. The i

window of Q (or S)is

denoted as Q

(or S

), (1 ≤ i ≤ p =





), that is,

=(q

(i−1)∗w+1

, ··· ,q

i∗w

For each window, we hope to ﬁnd one or more features,

based on which we can construct the ﬁltering condition. In

this work, we choose to utilize one single feature, the mean

value of the window. The advantages are two-folds. First, with

a single feature, we can build a one-dimensional index, which

improves the efﬁciency of index retrieval greatly. Second, the

mean value allows us to design the condition for both RSM

and cNSM queries.

We denote mean values of Q

and S

as μ

and μ

The condition consists of p number of ranges. The i

one

is denoted as [LR

, UR

] (1 ≤ i ≤ p). If S is a qualiﬁed

subsequence, for any i, μ

must fall within [LR

, UR

].Ifany

is outside the range, we can ﬁlter S safely.

A. RSM-ED Query Processing

In this section, we ﬁrst present the condition for the simplest

case, RSM-ED query, and then illustrate our approach.

Lemma 1. If S and Q are in ε-match under ED measure, that

is, ED(S, Q) ≤ ε, then μ

(1 ≤ i ≤ p) must satisfy

∈



−

√

,μ

√



. (1)

Proof. Based on the ED deﬁnition, we have

(S, Q)=



k=1

− q

)

≥

i∗w



j=(i−1)∗w+1

− q

)

868

剩余11页未读，继续阅读

weixin_38739101

粉丝: 7
资源: 945

KV-match：归一化与时间扭曲子序列匹配的新方法

kv-logger:一个微型记录器，支持带kv参数的日志

idb-kv-store:由IndexedDB支持的Web浏览器的持久键值存储

kv-mongodb:由MongoDB提供支持的@konceiverkv规范实现

kv-redis:由Redis提供支持的@konceiverkv规范实现

vdom-kv-input:键值对的虚拟域输入

docker-plugin-kv-consul:在容器启动时设置领事密钥

simple-workers-kv-cache:使用Cloudflare Workers和KV对HTML页面进行基于路径的简单路由和缓存

loopback-connector-kv-redis:用于LoopBack的官方Redis KeyValue连接器

kv-file:符合@konceiverkv规范的实现，由磁盘文件提供支持

cloudflare-kv-storage-rest:适用于cloudflare kv-storage rest-api的微型JavaScript和节点包装器

最新资源