1432 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 7, JULY 2016
very small and can be neglected. On the other hand, in terms
of |a
22,1
| < 1 we can have lim
n→+∞
a
12,1
a
n−1
22,1
X
2,t−n
= 0.
Therefore, when m
is large enough and n m
, (11) can be
approximately written as
X
1,t
= a
11,1
X
1,t−1
+ a
12,1
a
21,1
×
X
1,t−2
+ a
22,1
X
1,t−3
+···+a
m
−2
22,1
X
1,t−m
+a
12,1
η
2,t−1
+ a
22,1
η
2,t−2
+···+a
m
−1
22,1
η
2,t−m
+η
1,t
. (13)
Moreover, we can easily observe the fact that the larger m
is chosen, the more close to the real X
1,t
the right side
of (13) is. Comparing (13) with X
1,t
in the autoregressive
model (1), one can have a
1,1
= a
11,1
, a
1,2
= a
12,1
a
21,1
,
a
1,3
= a
12,1
a
21,1
a
22,1
,...,a
1,m
= a
12,1
a
21,1
a
m
−2
22,1
,
1,t
=
a
12,1
η
2,t−1
+ a
22,1
η
2,t−2
+···+a
m
−1
22,1
η
2,t−m
+η
1,t
.
Let ξ
k,t
= η
2,t−k
, k = 1,...,m
. Then ξ
1
,...,ξ
m
are m
random variables. Since η
1
and η
2
are uncorrelated
over time and η
1
and η
2
have zero mean, one can easily see
that: 1) ξ
1
,...,ξ
m
are all uncorrelated and have zero mean,
and
1
has zero mean; 2) σ
2
ξ
k
= σ
2
η
2
, k = 1,...,m
;and
3) E[η
1
ξ
k
]=0, k = 1,...,m
. Then, we can derive the
variance of
1
as
σ
2
1
= E
2
1
= E
⎡
⎢
⎣
⎛
⎝
η
1
+
m
k=1
a
12,1
a
k−1
22,1
ξ
k
⎞
⎠
2
⎤
⎥
⎦
= E
⎡
⎢
⎣
η
2
1
+
⎛
⎝
m
k=1
a
12,1
a
k−1
22,1
ξ
k
⎞
⎠
2
+ 2η
1
m
k=1
a
12,1
a
k−1
22,1
ξ
k
⎤
⎥
⎦
= E
η
2
1
+
m
k=1
a
2
12,1
a
2(k−1)
22,1
E
ξ
2
k
+2
m
k=1
a
12,1
a
k−1
22,1
E[η
1
ξ
k
]
= σ
2
η
1
+ σ
2
η
2
a
2
12,1
m
k=1
a
2(k−1)
22,1
= σ
2
η
1
+ σ
2
η
2
a
2
12,1
1 − a
2m
22,1
1 − a
2
22,1
. (14)
Since |a
22,1
| < 1, when m
is large enough, one can see that
a
2m
22,1
is very small and can be neglected. Thus, from (14) one
can approximately obtain
σ
2
1
= σ
2
η
1
+
σ
2
η
2
a
2
12,1
1 − a
2
22,1
. (15)
Then, from (14) it follows that GC value:
F
X
2
→X
1
= ln
σ
2
1
σ
2
η
1
= ln
1 +
σ
2
η
2
a
2
12,1
1 − a
2m
22,1
σ
2
η
1
1 − a
2
22,1
> 0
(16)
or, from (15), it follows that GC value can be approximately
written as:
F
X
2
→X
1
= ln
σ
2
1
σ
2
η
1
≈ ln
1 +
σ
2
η
2
a
2
12,1
σ
2
η
1
1 − a
2
22,1
> 0 (17)
when m
in (16) is large enough.
Therefore, in a mathematical way we give an exact descrip-
tion for GC from X
2
to X
1
in (16) or (17) from which one can
clearly see that GC from X
2
to X
1
involves the coefficients
(a
12,1
and a
22,1
) and variances (σ
2
η
1
and σ
2
η
2
) of the noises, of
the linear regression model (9).
According to (6), one can see that NC from X
2
to X
1
for
model (9) can be written as
n
X
2
→X
1
=
N
t=1
(a
12,1
X
2,t−1
)
2
2
h=1
N
t=1
(a
1h,1
X
h,t−1
)
2
+ Nσ
2
η
1
(18)
from which one can clearly see that NC in (18) also involves
the coefficients (a
11,1
and a
12,1
) and variance (σ
2
η
1
)ofthe
noise, of the linear regression model (9).
Therefore, we have theoretically shown that both
GC and NC involve coefficients of the linear regression
model. Barrett and Barnett [35] said that NC is a causal
mechanism because it includes the coefficients of the linear
regression model. This statement is not correct because GC
in a mathematical way also involves the coefficients of the
linear regression model. Thus, NC like GC and all GC-alike
methods reveals causal effect (causal influence, causal flow, to
name a few). In fact, causal mechanism represents a process
and is a different concept from causal effect (see the diagram
of a simple causal mechanism in [36, Fig. 1]).
C. GC in Frequency Domain
Granger causal influence from X
2
to X
1
in frequency
domain is defined by
I
X
2
→X
1
( f ) =−ln
⎛
⎜
⎜
⎝
1 −
σ
η
2
2
−
σ
η
1
η
2
2
σ
η
1
2
|H
12
( f ) |
2
S
X
1
X
1
⎞
⎟
⎟
⎠
∈[0, +∞) (19)
where the transfer function H ( f ) = A
−1
( f ) whose compo-
nents are
H
11
( f ) =
1
det(A)
¯a
22
( f ), H
12
( f ) =
1
det(A)
¯a
12
( f )
H
21
( f ) =
1
det(A)
¯a
21
( f ), H
22
( f ) =
1
det(A)
¯a
11
( f )
A =[¯a
ij
]
2×2
, ¯a
kk
( f ) = 1−
m
j=1
a
kk, j
e
−i2π fk
k =1, 2
¯a
hl
( f ) =−
m
j=1
a
hl, j
e
−i2π fk
, h, l = 1, 2, h = l.