没有合适的资源?快使用搜索试试~ 我知道了~
首页LSTM 算法 长短期记忆网络
资源详情
资源评论
资源推荐
2017/10/6 Exploring LSTMs
http://blog.echen.me/2017/05/30/exploring-lstms/ 1/33
Exploring LSTMs
The first time I learned about LSTMs, my eyes glazed over.
Not in a good, jelly donut kind of way.
It turns out LSTMs are a fairly simple extension to neural networks, and they're behind
a lot of the amazing achievements deep learning has made in the past few years. So
I'll try to present them as intuitively as possible – in such a way that you could have
discovered them yourself.
But first, a picture:
Aren't LSTMs beautiful? Let's go.
(Note: if you're already familiar with neural networks and LSTMs, skip to the middle –
the first half of this post is a tutorial.)
Neural Networks
Imagine we have a sequence of images from a movie, and we want to label each
image with an activity (is this a fight?, are the characters talking?, are the characters
eating?).
How do we do this?
One way is to ignore the sequential nature of the images, and build a per-image
classifier that considers each image in isolation. For example, given enough images
and labels:
Our algorithm might first learn to detect low-level patterns like shapes and edges.
With more data, it might learn to combine these patterns into more complex ones, like
faces (two circular things atop a triangular thing atop an oval thing) or cats.
And with even more data, it might learn to map these higher-level patterns into activities
themselves (scenes with mouths, steaks, and forks are probably about eating).
2017/10/6 Exploring LSTMs
http://blog.echen.me/2017/05/30/exploring-lstms/ 2/33
This, then, is a deep ne ural ne twork: it takes an image input, returns an activity
output, and – just as we might learn to detect patterns in puppy behavior without
knowing anything about dogs (after seeing enough corgis, we discover common
characteristics like fluffy butts and drumstick legs; next, we learn advanced features
like splooting) – in between it learns to represent images through hidden layers of
representations.
Mathematically
I assume people are familiar with basic neural networks already, but let's quickly
review them.
A neural network with a single hidden layer takes as input a vector x, which we can
think of as a set of neurons.
Each input neuron is connected to a hidden layer of neurons via a set of learned weights.
The jth hidden neuron outputs , where is an activation function.
The hidden layer is fully connected to an output layer, and the jth output neuron outputs
. If we need probabilities, we can transform the output layer via a softmax
function.
In matrix notation:
where
x is our input vector
W is a weight matrix connecting the input and hidden layers
V is a weight matrix connecting the hidden and output layers
Common activation functions for are the sigmoid function, , which squashes
numbers into the range (0, 1); the hyperbolic tangent, , which squashes numbers
into the range (-1, 1), and the rectified linear unit, .
Here's a pictorial view:
=
ϕ
( )
h
j
∑
i
w
ij
x
i
ϕ
=
y
j
∑
i
v
ij
h
i
h
=
ϕ
(
Wx
)
y
=
Vh
ϕ
σ
(
x
)
tanh
(
x
)
ReLU
(
x
) =
max
(0,
x
)
2017/10/6 Exploring LSTMs
http://blog.echen.me/2017/05/30/exploring-lstms/ 3/33
(Note: to make the notation a little cleaner, I assume x and h each contain an extra
bias neuron fixed at 1 for learning bias weights.)
Remembering Information with RNNs
Ignoring the sequential aspect of the movie images is pretty ML 101, though. If we
see a scene of a beach, we should boost beach activities in future frames: an image
of someone in the water should probably be labeled swimming, not bathing, and an
image of someone lying with their eyes closed is probably suntanning. If we
remember that Bob just arrived at a supermarket, then even without any distinctive
supermarket features, an image of Bob holding a slab of bacon should probably be
categorized as shopping instead of cooking.
So what we'd like is to let our model track the state of the world:
1. After seeing each image, the model outputs a label and also updates the
knowledge it's been learning. For example, the model might learn to automatically
discover and track information like location (are scenes currently in a house or beach?),
time of day (if a scene contains an image of the moon, the model should remember that
it's nighttime), and within-movie progress (is this image the first frame or the 100th?).
Importantly, just as a neural network automatically discovers hidden patterns like edges,
shapes, and faces without being fed them, our model should automatically discover useful
information by itself.
2. When given a new image, the model should incorporate the knowledge it's gathered
to do a better job.
This, then, is a recurre nt ne ural netwo rk. Instead of simply taking an image and
returning an activity, an RNN also maintains internal memories about the world
(weights assigned to different pieces of information) to help perform its
classifications.
Mathematically
So let's add the notion of internal kno wle d g e to our equations, which we can think
of as pieces of information that the network maintains over time.
But this is easy: we know that the hidden layers of neural networks already encode
useful information about their inputs, so why not use these layers as the memory
passed from one time step to the next? This gives us our RNN equations:
Note that the hidden state computed at time ( , our internal knowledge) is fed
back at the next time step. (Also, I'll use concepts like hidden state, knowledge,
memories, and beliefs to describe interchangeably.)
=
ϕ
(
W
+
U
)
h
t
x
t
h
t
−1
=
V
y
t
h
t
t
h
t
h
t
2017/10/6 Exploring LSTMs
http://blog.echen.me/2017/05/30/exploring-lstms/ 4/33
Longer Memories through LSTMs
Let's think about how our model updates its knowledge of the world. So far, we've
placed no constraints on this update, so its knowledge can change pretty chaotically:
at one frame it thinks the characters are in the US, at the next frame it sees the
characters eating sushi and thinks they're in Japan, and at the next frame it sees polar
bears and thinks they're on Hydra Island. Or perhaps it has a wealth of information to
suggest that Alice is an investment analyst, but decides she's a professional assassin
after seeing her cook.
This chaos means information quickly transforms and vanishes, and it's difficult for the
model to keep a long-term memory. So what we'd like is for the network to learn
how to update its beliefs (scenes without Bob shouldn't change Bob-related
information, scenes with Alice should focus on gathering details about her), in a way
that its knowledge of the world evolves more gently.
This is how we do it.
1. Adding a forgetting mechanism. If a scene ends, for example, the model should
forget the current scene location, the time of day, and reset any scene-specific
information; however, if a character dies in the scene, it should continue remembering
that he's no longer alive. Thus, we want the model to learn a separate
forgetting/remembering mechanism: when new inputs come in, it needs to know which
beliefs to keep or throw away.
2. Adding a saving mechanism. When the model sees a new image, it needs to learn
whether any information about the image is worth using and saving. Maybe your mom
sent you an article about the Kardashians, but who cares?
3. So when new a input comes in, the model first forgets any long-term information it
decides it no longer needs. Then it learns which parts of the new input are worth using,
and saves them into its long-term memory.
4. Focusing long-term memory into working memory. Finally, the model needs to
learn which parts of its long-term memory are immediately useful. For example, Bob's age
may be a useful piece of information to keep in the long term (children are more likely to
be crawling, adults are more likely to be working), but is probably irrelevant if he's not in
the current scene. So instead of using the full long-term memory all the time, it learns
which parts to focus on instead.
2017/10/6 Exploring LSTMs
http://blog.echen.me/2017/05/30/exploring-lstms/ 5/33
This, then, is an lo ng sho rt-term me mory netwo rk. Whereas an RNN can overwrite
its memory at each time step in a fairly uncontrolled fashion, an LSTM transforms its
memory in a very precise way: by using specific learning mechanisms for which pieces
of information to remember, which to update, and which to pay attention to. This
helps it keep track of information over longer periods of time.
Mathematically
Let's describe the LSTM additions mathematically.
At time , we receive a new input . We also have our long-term and working
memories passed on from the previous time step, and (both n-length
vectors), which we want to update.
We 'll start with o ur lo ng -term me mo ry. First, we need to know which pieces of
long-term memory to continue remembering and which to discard, so we want to
use the new input and our working memory to learn a remember gate of n numbers
between 0 and 1, each of which determines how much of a long-term memory
element to keep. (A 1 means to keep it, a 0 means to forget it entirely.)
Naturally, we can use a small neural network to learn this remember gate:
(Notice the similarity to our previous network equations; this is just a shallow neural
network. Also, we use a sigmoid activation because we need numbers between 0 and
1.)
Next, we need to compute the information we can learn from , i.e., a candidate
additio n to o ur long-te rm memo ry:
is an activation function, commonly chosen to be .
Before we add the candidate into our memory, though, we want to learn which
parts o f it are actually worth using and saving:
(Think of what happens when you read something on the web. While a news article
might contain information about Hillary, you should ignore it if the source is
Breitbart.)
Let's now combine all these steps. After forgetting memories we don't think we'll ever
need again and saving useful pieces of incoming information, we have our updated
lo ng -term memo ry:
where denotes element-wise multiplication.
t
x
t
lt
m
t
−1
w
m
t
−1
remembe
=
σ
( +
w
)
r
t
W
r
x
t
U
r
m
t
−1
x
t
lt
=
ϕ
( +
w
)
m
′
t
W
l
x
t
U
l
m
t
−1
ϕ
tanh
sav
=
σ
( +
w
)
e
t
W
s
x
t
U
s
m
t
−1
lt
=
remembe
∘
lt
+
sav
∘
lt
m
t
r
t
m
t
−1
e
t
m
′
t
∘
剩余32页未读,继续阅读
十先生(公众号:Python知识学堂)
- 粉丝: 314
- 资源: 16
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0