XIA et al.: QUATERNION-VALUED ECHO STATE NETWORKS 665
A. Augmented Quaternion Statistics
Unlike the real domain where complete second-order statis-
tics of a random vector q(k) are described by the covariance
matrix R = E[qq
T
], in the complex and quaternion domains,
the covariance matrix is sufficient to describe only second-
order circular (proper) signals, which have equal power in data
components. For general second-order noncircular (improper)
quaternion signals, where powers in the data components may
be different, for optimal second-order modeling, we also need
to employ complementary covariance matrices (pseudocovari-
ances). These complementary covariance matrices are termed
the ı-covariance P, j-covariance S and κ-covariance T,and
are given by [16], [29]–[32]
P = E[qq
ıH
], S = E[qq
jH
], T = E[qq
κT
].
Remark 1: Complete second-order characteristics of a
quaternion random vector q are then described by the aug-
mented covariance matrix R
a
of an augmented vector q
a
=
[q
T
, q
ıT
, q
jT
, q
κH
]
T
,givenby
R
a
= E[q
a
q
aH
]=
⎡
⎢
⎢
⎣
RPST
P
ı
R
ı
T
ı
S
ı
S
j
T
j
R
j
P
j
T
κ
S
κ
P
κ
R
κ
⎤
⎥
⎥
⎦
. (5)
Notice that for proper signals, the pseudocovariance matrices
P, S,andT vanish; a signal that obeys this structure has a
probability distribution that is rotation invariant with respect
to all the six possible pairs of axes [16], [29]–[32]. However,
in most of the real-world applications, probability density
functions are rotation dependent, and hence require the use
of the augmented quaternion statistics.
B. Quaternion Widely Linear Model
To exploit the complete second-order statistics of
quaternion-valued signals in linear mean square error (MSE)
estimation, we first consider a quaternion-valued MSE
estimator given by
ˆy = E[y|q]
where ˆy is the estimated process, q the observed variable, and
E[·] the statistical expectation operator. For zero-mean jointly
normal q and y, the strictly linear estimation solution, similar
to those in R and C,isgivenby
ˆy = w
T
q
where w and q are, respectively, the coefficient and regressor
vector. Observe, however, that for all the components {y
r
, y
ı
,
y
j
, y
κ
}, we have
ˆy
η
= E[y
η
|q
r
, q
ı
, q
j
, q
κ
],η∈{r, ı,j,κ}
so that using the involutions in (1), we can express each
element of a quaternion variable as in (2). This gives, for
instance, for the real component of a quaternion variable
q
r
= (q + q
ı
+ q
j
+ q
κ
)/4, leading to the general expression
for all the components
ˆy
η
= E[y
η
|q, q
ı
, q
j
, q
κ
], and ˆy = E[y|q, q
ı
, q
j
, q
κ
].
In other words, to capture the full second-order information
available, we should use the original quaternion and its invo-
lutions, allowing us to arrive at the widely linear model [16],
[29], [30]
y = w
aT
q
a
= a
T
q + b
T
q
ı
+ c
T
q
j
+ d
T
q
κ
(6)
where w
a
=[a
T
, b
T
, c
T
, d
T
]
T
is the augmented weight vector.
IV. N
ONLINEAR ACTIVATION FUNCTIONS IN H
One of the difficulties in the design of hypercomplex RNNs
lies in the lack of analytic nonlinear activation functions, as
the CRF conditions for analyticity in H are very stringent [17].
For instance, a CRF differentiable quaternion function f (q)
should satisfy
∂ f
∂q
r
+ ı
∂ f
∂q
ı
+ j
∂ f
∂q
j
+ κ
∂ f
∂q
κ
= 0 ⇔
∂ f
∂q
∗
= 0. (7)
Only linear quaternion functions and constants fulfill these
conditions, yet nonlinear adaptive filtering in H requires dif-
ferentiable nonlinear functions. To circumvent the analyticity
problem, recent work in [24] adopted the LAC [23], based on
a complex-valued representation of a quaternion, to give
∂ f
∂q
r
=−ζ
∂ f
∂α
(8)
where ζ and α are, respectively, given by
ζ =
ıq
ı
+ j q
j
+ κq
κ
α
,α=
q
2
ı
+ q
2
j
+ q
2
κ
. (9)
In this way, an imaginary unit ζ comprises the vector part
of quaternions. Although the LAC only guarantees first-order
differentiability at the current operating point, this is a perfect
match for quaternion-valued gradient algorithms, which only
require gradient evaluation at a point.
Proposition 1: The quaternion exponential e
q
=
e
q
r
+ıq
ı
+jq
j
+κq
κ
satisfies the LAC in (8).
Proof: e
q
can be expanded using the Euler formula as
e
q
= e
q
r
(cos(α) + ζ sin(α))
= e
q
r
cos(α) +
ıq
ı
sin(α)
α
+
jq
j
sin(α)
α
+
κq
κ
sin(α)
α
where ζ and α are defined in (9), to give
∂e
q
∂q
r
= e
q
=−ζ
e
q
α
. (10)
Remark 2: Notice that the quaternion exponential e
−q
=
e
−(q
r
+ ıq
ı
+ jq
j
+ κq
κ
)
also satisfies the LAC in (8). This is
straightforward to show using the same approach as in
Proposition 1.
Remark 3: Quaternion transcendental nonlinear func-
tions, constructed on the basis of quaternion exponentials
e
q
and e
−q
, are a generic extension of those in R and C,
and also satisfy the LAC.
For a detailed proof of Remark 3, we refer to [24].
In this paper, we employ a fully quaternion tanh(q) function
to design the QESNs, defined as
tanh(q) =
e
q
− e
−q
e
q
+ e
−q
=
e
2q
− 1
e
2q
+ 1
(11)