高维数据子空间变化点检测：新模型与累积和算法

90 浏览量更新于2024-07-15 收藏 1.87MB PDF 举报

"子空间变化点检测是统计信号处理领域历史悠久的研究课题，对于涉及从流式数据中提取信息的许多现实世界应用来说仍然是一个基础问题。利用高维信号的低维结构（特别是子空间）是另一个重要的研究主题，因为它提高了计算、存储和通信的效率，同时也增强了数据的可解释性和理解性。本文中，我们提出了一种新的模型来表述子空间变化点检测问题，即一连串高维数据点在某一特定时间点突然改变其所在的低维子空间。接着，我们提出了一种基于累积和算法的方法来解决这个问题。从算法的角度看，所提出的检测方法具有计算效率高的优点，因为它能够快速处理数据并检测出变化点。" 在高维数据处理中，子空间变化点检测是一种关键技术，它旨在识别数据流中的突变时刻，这些突变可能是由于系统故障、环境变化或其他重要因素引起的。传统的变化点检测方法往往侧重于单变量或低维数据，但随着大数据时代的到来，对高维数据的分析需求日益增长，因此研究子空间变化点检测变得至关重要。本研究论文提出的新模型将数据序列视为在不同时间点上可能存在于不同低维子空间的轨迹。这种建模方式允许更准确地捕捉数据流中的复杂动态，特别是在数据具有内在结构的情况下。当子空间发生变化时，意味着数据的模式或特性发生了显著转变，这可能是系统状态变化的标志。为了解决这个问题，作者设计了一种基于累积和（Cumulative Sum, CUSUM）的算法。CUSUM算法是一种常用的在线检测方法，以其优良的检测性能和实时性而受到青睐。在子空间变化点检测的背景下，该算法通过跟踪数据在子空间中的变化来构建累积和序列，并在达到预设阈值时触发变化点的检测。这种方法的优点在于，它可以实时监测数据流，及时发现潜在的变化，同时保持了较低的计算复杂度。论文中，作者可能还详细讨论了算法的数学原理、性能分析（如误报率和漏报率）、仿真结果以及与其他现有方法的比较。此外，实际应用案例可能也包括在内，以证明该方法在真实世界情境中的有效性。这样的研究对于改进监控系统、故障预测、异常检测等领域的技术有着深远的影响，有助于提升数据驱动决策的精度和效率。

JIAO et al.: SUBSPACE C HANGE-POINT DETECTION: A NEW MODEL AND SOLUTION 1227

In the ﬁrst phase, we apply classic PCA method to histori-

cal data for estimating Z

. More speciﬁcally, denote by X :=

−T

,...,x

−1

, x

] the collected historical data, where T

denotes the memory size. Eigen-decomposition of XX

gives





⊥







⊥



, (5)

where Σ is a diagonal matrix with eigenvalues, sorted in

descending order, on its diagonal.



∈ R

D ×r

and



⊥

∈

D ×(D −r)

denotes, respectively, the orthonormal basis of



and



⊥

, which are the estimation of Z

and its orthogonal com-

plement.

In the second phase, we adopt the recursive form of CUSUM

for the task of detection. Using the estimated



⊥

, we propose

to deﬁne a CUSUM score as





⊥



− (D − r) σ

− c, (6)

where c>0 is a parameter. Initialized by y

=0,

:= max{y

t−1

+ L

, 0} (7)

denotes the cumulative statistics of CUSUM process. For some

threshold b>0, the algorithm alarms and returns T

= t as soon

as y

exceeds b at time i nstant t. The complete algorithm is listed

in Algorithm 1. Theoretical analysis on the algorithm, including

the inﬂuence of parameters b, c, is demonstrated in next section.

We provide some intuitive explanation for the proposed al-

gorithm. In (6) that deﬁnes the CUSUM score L

, the ﬁrst term





⊥



is actually the projected energy of x

onto



⊥

, while

the other two terms are ﬁxed. When t<T

, s ince the clean

signal in x

lies in Z

, the projection of x

onto



⊥

is little,

leading to a small L

with negative mean. As a consequence, y

can hardly increase to threshold b, and the algorithm is unlikely

to return a false alarm when no change happens. On the other

hand, when t ≥ T

, as the clean signal in x

lies in Z

, projec-

tion of x

onto



⊥

may be rather large, which leads to a prompt

increase in y

and thus a short detection delay. The discussion

above explains why the proposed algorithm is competent for the

task of subspace change-point detection.

To demonstrate the feasibility of this online algorithm, we

will discuss its computational complexity and memory re-

quirement. In Phase 1, PCA works on the historical data,

−T

,...,x

−1

, x

, to estimate a basis of the orthogonal com-

plement of pre-change subspace,



⊥

. The time complexity is

O(D

). In Phase 2, the basis matrix



⊥

and the current sample

is used to calculate L

by (6) with cost O(D(D − r)).The

storage of matrix



⊥

requires memory size O(D(D − r)).No-

tice that when r<D/2, there is an efﬁcient way of storing



and calculating the ﬁrst term in the RHS of (6) as below





⊥



= x



−







Therefore both the computational complexity and the mem-

ory requirement of the proposed algorithm are O(D min{D −

r, r}). When r  D or r  D, the algorithm complexity is

linear in D, which is optimal in order sense.

Algorithm 1: A Subspace Change-Point Detection Algo-

rithm.

Input: Historical data matrix X := [x

−T

,...,x

−1

, x

subspace dimension r, noise variance σ

, algorithm

parameters b, c.

Output: Stopping time T

Implementation:

Phase 1: Preparation

Obtain



⊥

by using (5);

Phase 2: Online Detection

Initialize: y

=0.

for t =1, 2,...do

Calculate the CUSUM score L

by using (6);

Calculate the cumulative statistics y

by using (7);

If y

>b, then break;

end for

return T

:= t.

IV. THEORETICAL ANALYSIS

We adopt the deﬁnition of F-distance in Section II-B to mea-

sure the difference between subspaces of interest. In particular,

we deﬁne the following distances,

Δ:=d (Z

, Z

) ,

ε := d





, Z





Δ:=d





, Z



. (8)

Notice that ε is small when our estimation about Z

is reasonably

accurate. Due to inequality of distance, the above quantities

satisfy



Δ − ε ≤ Δ ≤



Δ+ε.

We ﬁrst consider the general case, where the entries of s

and

follow independent, unknown sub-Gaussian distributions,

with mean zero and variance σ

and σ

, respectively. Next we

consider a special case where s

and n

are Gaussian random

vectors, following the distribution N(0,σ

) and N(0,σ

respectively. Before evaluating the performance of our algorithm

in terms of ARL and EDD, we ﬁrst examine properties of the

CUSUM score L

A. Distribution of L

The following lemma shows that L

with its expectation re-

moved is a sub-exponential random variable. Based on this, we

next give the condition on L

to be a qualiﬁed CUSUM score.

Finally, we derive the requirement to parameter c brought by

the above condition.

Lemma 2: Deﬁne



− σ

+ c, t < T

;

− σ



+ c, t ≥ T

(9)

where L

is deﬁned by (6). We have

∼ subE



¯s

+(D − r)σ

¯n



(10)

剩余15页未读，继续阅读

weixin_38535364

粉丝: 12

高维数据子空间变化点检测：新模型与累积和算法

基于子空间跟踪的盲自适应多用户检测算法

数据驱动的子空间自适应故障检测：解决太阳能发电系统不确定性问题

鲁棒车道检测：基于双曲模型的新方法

OpenCV车距检测疑难杂症一站解决：常见问题与解决方案

OpenCV行人检测与目标跟踪联手出击：实时行人跟踪解决方案

雷达系统优化：多径效应解决方案提升运动目标检测性能

如何应对三子棋光照变化：实用解决方案速查

腾讯开悟模型在企业级应用的挑战：案例研究与解决方案，让企业AI转型无阻碍

电子科技大学信号检测与估计：理论到实践的一步到位解决方案

模型监控与调试实战：实时检测与解决模型偏差与不公平性

最新资源