This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
ZHANG et al.: INCIPIENT FAULT DETECTION FOR MULTIPHASE BATCH PROCESSES 3
For a set of nonstationary time series X(M × N) =
[x
1
, x
2
,...,x
N
], x
t
= (x
1
, x
2
,...,x
M
)
T
becomes stationary
after differencing d times, the nonstationary time series is said
to be integrated of order d denoted as x
t
∼ I (d),whereM is
the number of nonstationary time series and N is the number
of samples. Engle and Granger have proved that if a set of
nonstationary time series hold a long-run dynamic equilib-
rium relation, the linear combination of these nonstationary
time series must be stationary which can be described as
follows:
ζ
t
= β
1
x
1
+ β
2
x
2
+···+β
M
x
M
= β
T
x
t
(1)
where ζ
t
is the equilibrium residual sequence that is integrated
of order 0, β = (β
1
,β
2
,...,β
M
)
T
is a cointegration vector,
and the variables are said to be cointegrated variables.
Since there may be more than one cointegration vector to
describe the long-run equilibrium relation among the nonsta-
tionary variables, the cointegration model for N nonstationary
variables can be calculated as follows:
γ = B
T
X (2)
where B(M × A) =[β
1
,...,β
A
] is the cointegration matrix,
and γ (A × N) =[ζ
1
,...,ζ
A
]
T
is the equilibrium residual
matrix.
To find the cointegration vectors, a cointegration test method
based on vector autoregressive (VAR) model is proposed by
Johansen and Juselius [29]. The cointegration vectors are
calculated by constructing the VAR model of order p which
can be described as follows:
x
t
=
1
x
t−1
+···+
p
x
t− p
+ c + μ
t
(3)
where
i
(N × N) is the coefficient matrix, μ
t
(N × 1) is the
vector of white noise components distributed as N(0, ),and
c(N × 1) is the constant vector.
The vector error-correction model can be obtained by sub-
tracting x
t−1
from both side of (3) which is described as
follows:
x
t
=
p−1
i=1
i
x
t−i
+ Zx
t−1
+ μ
t
(4)
where Z =−I
N
+
p
i=1
i
and
i
=−
p
j=i+1
j
,
i = 1, 2,... p − 1.
The matrix Z can be decomposed into two matrices with
column full rank
Z = AB
T
(5)
where A and B are two M × A matrices with full column rank.
Then (4) can be changed into the following equation:
x
t
=
p-1
i=1
i
x
t−i
+ AB
T
x
t−1
+ μ
t
. (6)
Therefore, the residual matrix is obtained as follows:
γ
t−1
= B
T
x
t−1
= (A
T
A)
-1
A
T
×
⎛
⎝
x
t
−
p-1
i=1
i
x
t−i
− μ
t
⎞
⎠
. (7)
Because the components in X are cointegrated of order one,
x
t
in (7) is stationary obviously whose probability density
function can be described as
f (x
t
) =
N
i=1
f (x
i
) = (2π)
−N/2
||
−1/2
× exp
−
1
2
μ
T
t
−1
μ
t
. (8)
To estimate the columns in B, i.e., the cointegration
vectors, the regressive models between x
t
and x
t−i
(i = 1, 2,...,p − 1) are built. Meanwhile, the regressive
model is also constructed for x
t−1
and x
t−i
x
t
=
p−1
i=1
i
x
t−i
+ e
0
(9)
x
t−1
=
p−1
i=1
i
x
t−i
+ e
1
(10)
where the coefficient
i
and
i
can be estimated by ordinary
least square (OLS), and e
0
and e
1
are the residuals.
The cointegration matrix B can be obtained by solving the
eigenvalue equation
|λS
11
− S
10
S
−1
00
S
01
|=0 (11)
where S
ij
= (1/M)e
i
e
T
j
(i, j = 0, 1). The cointegration
vectors contained in B can be estimated as the eigenvectors
corresponding to the first A eigenvalues λ
1
≥ λ
2
≥ ··· ≥λ
A
which can be obtained from (11).
III. M
ETHODOLOGY
As mentioned before, sufficient modeling batches cannot
be guaranteed for some batch processes. In this section,
a two-layer method based on CA and PCA is proposed for
the incipient fault detection of batch processes with limited
batches by distinguishing incipient fault from normal changes
in nonstationary variables.
For a batch process with J process variables, the historical
data collected in normal working condition is composed
of a 3-D array X(I × J × K),whereI is the number of
batches and K is the number of sampling points in each
batch. To divide the batch process into several phases, many
phase division methods such as expert knowledge-based
methods, process analysis techniques, and data clustering
based method have been developed [30], [31]. In this paper,
however, it is assumed that the batch process has already
been divided into several phases. For each phase, a concurrent
variable separation strategy has been developed to distinguish
nonstationary variables from stationary variables, which
is described in Section III-A. Then two-layer modeling
strategy-based incipient fault detection is proposed in
Section III-B. Online fault detection strategy is described
in Section III-C.
A. Concurrent Identification of Nonstationary Variables
In each phase of batch process, some variables are sta-
tionary while some variables are nonstationary along time