LSTM算法长短期记忆网络_LSTM

长短期记忆

LSTM

需积分: 46 44 浏览量更新于2023-05-29 评论 2 收藏 2.99MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

2017/10/6 Exploring LSTMs

http://blog.echen.me/2017/05/30/exploring-lstms/ 1/33

Exploring LSTMs

The first time I learned about LSTMs, my eyes glazed over.

Not in a good, jelly donut kind of way.

It turns out LSTMs are a fairly simple extension to neural networks, and they're behind

a lot of the amazing achievements deep learning has made in the past few years. So

I'll try to present them as intuitively as possible – in such a way that you could have

discovered them yourself.

But first, a picture:

Aren't LSTMs beautiful? Let's go.

(Note: if you're already familiar with neural networks and LSTMs, skip to the middle –

the first half of this post is a tutorial.)

Neural Networks

Imagine we have a sequence of images from a movie, and we want to label each

image with an activity (is this a fight?, are the characters talking?, are the characters

eating?).

How do we do this?

One way is to ignore the sequential nature of the images, and build a per-image

classifier that considers each image in isolation. For example, given enough images

and labels:

Our algorithm might first learn to detect low-level patterns like shapes and edges.

With more data, it might learn to combine these patterns into more complex ones, like

faces (two circular things atop a triangular thing atop an oval thing) or cats.

And with even more data, it might learn to map these higher-level patterns into activities

themselves (scenes with mouths, steaks, and forks are probably about eating).

2017/10/6 Exploring LSTMs

http://blog.echen.me/2017/05/30/exploring-lstms/ 3/33

(Note: to make the notation a little cleaner, I assume x and h each contain an extra

bias neuron fixed at 1 for learning bias weights.)

Remembering Information with RNNs

Ignoring the sequential aspect of the movie images is pretty ML 101, though. If we

see a scene of a beach, we should boost beach activities in future frames: an image

of someone in the water should probably be labeled swimming, not bathing, and an

image of someone lying with their eyes closed is probably suntanning. If we

remember that Bob just arrived at a supermarket, then even without any distinctive

supermarket features, an image of Bob holding a slab of bacon should probably be

categorized as shopping instead of cooking.

So what we'd like is to let our model track the state of the world:

1. After seeing each image, the model outputs a label and also updates the

knowledge it's been learning. For example, the model might learn to automatically

discover and track information like location (are scenes currently in a house or beach?),

time of day (if a scene contains an image of the moon, the model should remember that

it's nighttime), and within-movie progress (is this image the first frame or the 100th?).

Importantly, just as a neural network automatically discovers hidden patterns like edges,

shapes, and faces without being fed them, our model should automatically discover useful

information by itself.

2. When given a new image, the model should incorporate the knowledge it's gathered

to do a better job.

This, then, is a recurre nt ne ural netwo rk. Instead of simply taking an image and

returning an activity, an RNN also maintains internal memories about the world

(weights assigned to different pieces of information) to help perform its

classifications.

Mathematically

So let's add the notion of internal kno wle d g e to our equations, which we can think

of as pieces of information that the network maintains over time.

But this is easy: we know that the hidden layers of neural networks already encode

useful information about their inputs, so why not use these layers as the memory

passed from one time step to the next? This gives us our RNN equations:

Note that the hidden state computed at time ( , our internal knowledge) is fed

back at the next time step. (Also, I'll use concepts like hidden state, knowledge,

memories, and beliefs to describe interchangeably.)

(

)

−1

2017/10/6 Exploring LSTMs

http://blog.echen.me/2017/05/30/exploring-lstms/ 4/33

Longer Memories through LSTMs

Let's think about how our model updates its knowledge of the world. So far, we've

placed no constraints on this update, so its knowledge can change pretty chaotically:

at one frame it thinks the characters are in the US, at the next frame it sees the

characters eating sushi and thinks they're in Japan, and at the next frame it sees polar

bears and thinks they're on Hydra Island. Or perhaps it has a wealth of information to

suggest that Alice is an investment analyst, but decides she's a professional assassin

after seeing her cook.

This chaos means information quickly transforms and vanishes, and it's difficult for the

model to keep a long-term memory. So what we'd like is for the network to learn

how to update its beliefs (scenes without Bob shouldn't change Bob-related

information, scenes with Alice should focus on gathering details about her), in a way

that its knowledge of the world evolves more gently.

This is how we do it.

1. Adding a forgetting mechanism. If a scene ends, for example, the model should

forget the current scene location, the time of day, and reset any scene-specific

information; however, if a character dies in the scene, it should continue remembering

that he's no longer alive. Thus, we want the model to learn a separate

forgetting/remembering mechanism: when new inputs come in, it needs to know which

beliefs to keep or throw away.

2. Adding a saving mechanism. When the model sees a new image, it needs to learn

whether any information about the image is worth using and saving. Maybe your mom

sent you an article about the Kardashians, but who cares?

3. So when new a input comes in, the model first forgets any long-term information it

decides it no longer needs. Then it learns which parts of the new input are worth using,

and saves them into its long-term memory.

4. Focusing long-term memory into working memory. Finally, the model needs to

learn which parts of its long-term memory are immediately useful. For example, Bob's age

may be a useful piece of information to keep in the long term (children are more likely to

be crawling, adults are more likely to be working), but is probably irrelevant if he's not in

the current scene. So instead of using the full long-term memory all the time, it learns

which parts to focus on instead.

2017/10/6 Exploring LSTMs

http://blog.echen.me/2017/05/30/exploring-lstms/ 5/33

This, then, is an lo ng sho rt-term me mory netwo rk. Whereas an RNN can overwrite

its memory at each time step in a fairly uncontrolled fashion, an LSTM transforms its

memory in a very precise way: by using specific learning mechanisms for which pieces

of information to remember, which to update, and which to pay attention to. This

helps it keep track of information over longer periods of time.

Mathematically

Let's describe the LSTM additions mathematically.

At time , we receive a new input . We also have our long-term and working

memories passed on from the previous time step, and (both n-length

vectors), which we want to update.

We 'll start with o ur lo ng -term me mo ry. First, we need to know which pieces of

long-term memory to continue remembering and which to discard, so we want to

use the new input and our working memory to learn a remember gate of n numbers

between 0 and 1, each of which determines how much of a long-term memory

element to keep. (A 1 means to keep it, a 0 means to forget it entirely.)

Naturally, we can use a small neural network to learn this remember gate:

(Notice the similarity to our previous network equations; this is just a shallow neural

network. Also, we use a sigmoid activation because we need numbers between 0 and

1.)

Next, we need to compute the information we can learn from , i.e., a candidate

additio n to o ur long-te rm memo ry:

is an activation function, commonly chosen to be .

Before we add the candidate into our memory, though, we want to learn which

parts o f it are actually worth using and saving:

(Think of what happens when you read something on the web. While a news article

might contain information about Hillary, you should ignore it if the source is

Breitbart.)

Let's now combine all these steps. After forgetting memories we don't think we'll ever

need again and saving useful pieces of incoming information, we have our updated

lo ng -term memo ry:

where denotes element-wise multiplication.

−1

remembe

( +

)

−1

( +

)

′

−1

tanh

sav

( +

)

−1

remembe

∘

sav

∘

−1

′

∘

剩余32页未读，继续阅读

十先生(公众号：Python知识学堂）

粉丝: 314
资源: 16

会员权益专享

LSTM 算法长短期记忆网络

评论0

会员权益专享

最新资源

LSTM 算法 长短期记忆网络

评论0

神经网络LSTM 时间预测

基于python和tensorflow的双向长短时记忆网络代码

长短时记忆神经网络（LSTM-2）型介绍及公式推导

基于遗传算法(ga)优化长短期记忆网络(ga-lstm

python实现WOA-BiLSTM鲸鱼算法优化双向长短期记忆神经网络

matlab emd-lstm算法

EKF-LSTM算法

Lstm算法进行干旱预测

Matlab实现LSTM长短期记忆神经网络多变量时间序列预测

锂电池寿命预测 | python实现基于lstm长短期记忆神经网络的锂电池寿命预测(tensor

基于长短期记忆网络的时间序列预测 lstm时间序列算法 matlab实现过程

利用pytorch长短期记忆网络lstm实现股票预测分析

实现attention-lstm(注意力机制长短期记忆神经网络)多输入单输出

长短期记忆网络(LSTM)情感分析模型构建

使用卷积神经网络-长短期记忆网络(bi-lstm)-注意力机制对股票收盘价进行回归预测

贝叶斯网络lstm python

长短期记忆神经网络算法的构建流程

lstm 灰狼算法 python

粒子群优化算法LSTM

MATLAB LSTM神经网络算法详细讲解

会员权益专享

最新资源

LSTM 算法长短期记忆网络