state pair as follows [32]
R(i, k) = H[d(x
i
, x
k
)] (1)
where is an element of the recurrence matrix ,
is a distance function of and , and is the
Heaviside function expressed as
H[d(x
i
, x
k
)] =
{
1, d(x
i
, x
k
) ≤ ϵ
0, d(x
i
, x
k
) > ϵ.
(2)
An FRP constructs the recurrence of the phase-space vectors
as a grayscale image that takes values in without
requiring the similarity threshold parameter needed for the
RP analysis. The formulation of an FRP is described as
follows [19].
Let be the set of fuzzy clusters of the phase-space
vectors. A binary relation from to is a fuzzy subset of
characterized by a fuzzy membership function
. This fuzzy membership grade is the degree of
relation of each pair , , , that
has the following properties [33]:
1) Reflexivity: ,
µ(x
i
, v
j
) = µ(v
j
, x
i
), ∀x ∈ X,∀v ∈ V
2) Symmetry: , and
µ(x
i
, x
k
) = ∨
v
[µ(x
i
, v
j
) ∧ µ(v
j
, x
k
)], ∀x ∈ X
3) Transitivity: ,
which is called the max-min composition, where the symbols
and stand for max and min, respectively.
By specifying several clusters for the data, the fuzzy -
means algorithm [34] is applied to identify the fuzzy clusters
of the phase-space vectors and determine the similarity
between the states and the fuzzy cluster centers. Based on this
direct similarity measure, the inference of the similarity
between the pairs of the states can be carried out using the
max-min composition of a fuzzy relation.
The fuzzy membership of assigned to a cluster center
of , or denoted as , is computed using the fuzzy
-means (FCM) algorithm [34] that attempts to partition
elements of into a set of fuzzy clusters, , by
minimizing the following objective function :
F =
N
∑
i=1
c
∑
k=1
(µ
i j
)
w
∥x
i
− v
j
∥
2
(3)
where is the fuzzy weighting exponent, and is
subject to
c
∑
j=1
µ
i j
= 1, i = 1,. . . , N. (4)
The minimization of the objective function of the FCM is
numerically carried out by an iterative process of updating the
fuzzy membership grades and cluster centers until the
convergence or maximum number of iterations is reached.
The fuzzy membership grades and cluster centers are updated
as
µ
i j
=
1
c
∑
q=1
(
∥x
i
− v
j
∥
∥x
i
− v
q
∥
)
2/(w−1)
. (5)
v
j
=
N
∑
i=1
(µ
i j
)
w
x
i
N
∑
i=1
(µ
i j
)
w
, j = 1,. . . , c. (6)
Using the values of the fuzzy membership derived from the
FCM and the fuzzy relation to represent the degree of
recurrence among the phase-space vectors of the time series,
an FRP can be visualized as a grayscale image by taking the
complement of the FRP matrix that displays a black pixel if
, , otherwise a pixel with a shade of gray.
IV. LSTM Neural Networks With FRPs
An LSTM neural network [35] is an artificial recurrent
neural network (RNN) used in deep learning. Unlike
conventional feedforward neural networks, an LSTM model
has feedback loops that allow information of previous events
to be carried on in the sequential learning process. Therefore,
LSTM networks are effective in learning and classifying
sequential data such as speech and video analysis [36]–[39].
The internal state of an RNN is used as a memory cell to
map real values of input sequences to those of output
sequences that reflect the dynamic pattern of time series, and
therefore is considered an effective algorithm for learning and
modeling temporal data [40]. However, because an RNN uses
sequential processing over time steps that can easily degrade
the parameters capturing short-term dependencies of
information sequentially passing through all cells before
arriving at the current processing cell. This effect causes the
gradient of the output error with respect to previous inputs to
vanish by the multiplication of many small numbers being less
than zero. This problem is known as vanishing gradients [41].
LSTM networks attempt to overcome the problem of the
vanishing gradients encountered by conventional RNNs by
using gates to keep relevant information and forget irrelevant
information.
The difference between an LSTM neural network and a
conventional RNN is the use of memory blocks for the former
network instead of hidden units for the latter [42]. The input
gate of an LSTM network guides the input activations into the
memory cell and the output gate carry outs the output flow of
cell activations into the rest of the network. The forget gate
allows the flow of information from the memory block to the
cell as an additive input, therefore adaptively forgetting or
resetting the cell memory. Thus, being less sensitive to the
time steps makes LSTM networks better for analysis of
sequential data than conventional RNNs. A common LSTM
model is composed of a memory cell, an input gate, an output
gate, and a forget gate. The cell memorizes values over time
steps and the three gates control the flow of information into
and out of the cell. The weights and biases to the input gate
regulate the amount of new value flowing into the cell, while
the weights and biases to the forget gate and output gate
control the amount of information to remain in the cell and the
extent to which the value in the cell is used to compute the
output activation of the LSTM block, respectively.
The architecture for an LSTM block, in which the fuzzy
1308 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 6, NO. 6, NOVEMBER 2019