JIAO et al.: SUBSPACE C HANGE-POINT DETECTION: A NEW MODEL AND SOLUTION 1227
In the first phase, we apply classic PCA method to histori-
cal data for estimating Z
0
. More specifically, denote by X :=
[x
−T
h
+1
,...,x
−1
, x
0
] the collected historical data, where T
h
denotes the memory size. Eigen-decomposition of XX
T
gives
XX
T
=
Z
0
,
Z
⊥
Σ
Z
0
,
Z
⊥
T
, (5)
where Σ is a diagonal matrix with eigenvalues, sorted in
descending order, on its diagonal.
Z
0
∈ R
D ×r
and
Z
⊥
∈
R
D ×(D −r)
denotes, respectively, the orthonormal basis of
Z
0
and
Z
⊥
, which are the estimation of Z
0
and its orthogonal com-
plement.
In the second phase, we adopt the recursive form of CUSUM
for the task of detection. Using the estimated
Z
⊥
, we propose
to define a CUSUM score as
L
t
:=
Z
T
⊥
x
t
2
− (D − r) σ
2
n
− c, (6)
where c>0 is a parameter. Initialized by y
0
=0,
y
t
:= max{y
t−1
+ L
t
, 0} (7)
denotes the cumulative statistics of CUSUM process. For some
threshold b>0, the algorithm alarms and returns T
s
= t as soon
as y
t
exceeds b at time i nstant t. The complete algorithm is listed
in Algorithm 1. Theoretical analysis on the algorithm, including
the influence of parameters b, c, is demonstrated in next section.
We provide some intuitive explanation for the proposed al-
gorithm. In (6) that defines the CUSUM score L
t
, the first term
Z
⊥
x
t
2
is actually the projected energy of x
t
onto
Z
⊥
, while
the other two terms are fixed. When t<T
c
, s ince the clean
signal in x
t
lies in Z
0
, the projection of x
t
onto
Z
⊥
is little,
leading to a small L
t
with negative mean. As a consequence, y
t
can hardly increase to threshold b, and the algorithm is unlikely
to return a false alarm when no change happens. On the other
hand, when t ≥ T
c
, as the clean signal in x
t
lies in Z
1
, projec-
tion of x
t
onto
Z
⊥
may be rather large, which leads to a prompt
increase in y
t
and thus a short detection delay. The discussion
above explains why the proposed algorithm is competent for the
task of subspace change-point detection.
To demonstrate the feasibility of this online algorithm, we
will discuss its computational complexity and memory re-
quirement. In Phase 1, PCA works on the historical data,
x
−T
h
+1
,...,x
−1
, x
0
, to estimate a basis of the orthogonal com-
plement of pre-change subspace,
Z
⊥
. The time complexity is
O(D
3
). In Phase 2, the basis matrix
Z
⊥
and the current sample
x
t
is used to calculate L
t
by (6) with cost O(D(D − r)).The
storage of matrix
Z
⊥
requires memory size O(D(D − r)).No-
tice that when r<D/2, there is an efficient way of storing
Z
0
and calculating the first term in the RHS of (6) as below
Z
T
⊥
x
t
2
= x
t
2
−
Z
T
0
x
t
2
.
Therefore both the computational complexity and the mem-
ory requirement of the proposed algorithm are O(D min{D −
r, r}). When r D or r D, the algorithm complexity is
linear in D, which is optimal in order sense.
Algorithm 1: A Subspace Change-Point Detection Algo-
rithm.
Input: Historical data matrix X := [x
−T
h
+1
,...,x
−1
, x
0
],
subspace dimension r, noise variance σ
2
n
, algorithm
parameters b, c.
Output: Stopping time T
s
.
Implementation:
Phase 1: Preparation
Obtain
Z
⊥
by using (5);
Phase 2: Online Detection
Initialize: y
0
=0.
for t =1, 2,...do
Calculate the CUSUM score L
t
by using (6);
Calculate the cumulative statistics y
t
by using (7);
If y
t
>b, then break;
end for
return T
s
:= t.
IV. THEORETICAL ANALYSIS
We adopt the definition of F-distance in Section II-B to mea-
sure the difference between subspaces of interest. In particular,
we define the following distances,
Δ:=d (Z
0
, Z
1
) ,
ε := d
Z
0
, Z
0
,
Δ:=d
Z
0
, Z
1
. (8)
Notice that ε is small when our estimation about Z
0
is reasonably
accurate. Due to inequality of distance, the above quantities
satisfy
Δ − ε ≤ Δ ≤
Δ+ε.
We first consider the general case, where the entries of s
t
and
n
t
follow independent, unknown sub-Gaussian distributions,
with mean zero and variance σ
2
s
and σ
2
n
, respectively. Next we
consider a special case where s
t
and n
t
are Gaussian random
vectors, following the distribution N(0,σ
2
s
I
r
) and N(0,σ
2
n
I
D
),
respectively. Before evaluating the performance of our algorithm
in terms of ARL and EDD, we first examine properties of the
CUSUM score L
t
.
A. Distribution of L
t
The following lemma shows that L
t
with its expectation re-
moved is a sub-exponential random variable. Based on this, we
next give the condition on L
t
to be a qualified CUSUM score.
Finally, we derive the requirement to parameter c brought by
the above condition.
Lemma 2: Define
˜
L
t
:=
L
t
− σ
2
s
ε
2
+ c, t < T
c
;
L
t
− σ
2
s
Δ
2
+ c, t ≥ T
c
,
(9)
where L
t
is defined by (6). We have
˜
L
t
∼ subE
K
ξ
t
σ
2
s
¯s
2
ψ
2
+(D − r)σ
2
n
¯n
2
ψ
2
(10)