![](https://csdnimg.cn/release/download_crawler_static/10127272/bg1.jpg)
Deriving LSTM Gradient for Backpropagation
Recurrent Neural Network (RNN) is hot in these past years, especially with the boom of Deep
Learning. Just like any deep neural network, RNN can be seen as a (very) deep neural network if
we “unroll” the network with respect of the time step. Hence, with all the things that enable
vanilla deep network, training RNN become more and more feasible too.
The most popular model for RNN right now is the LSTM (Long Short-Term Memory) network.
For the background theory, there are a lot of amazing resources available in Andrej Karpathy’s
blog and Chris Olah’s blog.
Using modern Deep Learning libraries like TensorFlow, Torch, or
Theano nowadays, building an LSTM model would be a breeze as we don’t
need to analytically derive the backpropagation step. However to
understand the model better, it’s absolutely a good thing, albeit
optional, to try to derive the LSTM net gradient and implement the
backpropagation “manually”.
So, here, we will try to first implement the forward computation step
according to the LSTM net formula, then we will try to derive the
network gradient analytically. Finally, we will implement it using
numpy.
LSTM Forward
We will follow this model for a single LSTM cell:
Agustinus Kristiadi's Blog